Entries Comments

Alex Mace’s Blog

Life & Web Development

Where Should Data Be Validated In Objects?

Last week in work we were talking about data validation when working with data inside objects. We are currently in the process of replacing a lot of old, procedural code with OOP style code. When objects perform operations, they need to have certain data to work with and we need to have a way of handling situations where those objects do not have all of the data that is needed.

I think the data being used falls into three different categories and you can place different levels of trust on those categories. The first is user submitted data. User submitted data should not be trusted one bit. While 99.9% of your users may be trustworthy people, it only takes 1 curious or worse, malicious, person to start sending you unexpected data to cause you a major problem. The next category is data that has come from outside the current block of code. This data should be moderately distrusted, since you are assuming that whatever process gave you that data has set it correctly. To err is human and since software is written by humans, it is natural that software will err too. So it would be sensible to check that data is at the very least reasonable before proceeding. Finally there is data set by the current block of code. It would seem sensible to trust this data. You can see where it has come from and know it’s possible value. No need to spend a lot of time checking the value of it.

Now, what if any of this data is wrong? My original thoughts on this was that the objects should check the data on use and throw an Exception in all cases. However, the procedural code would have told the user about all of the data that was missing. Using Exceptions, you can only really deal with one piece of data at a time. You could concatenate errors and then throw an Exception afterwards, have nested catch statements, etc, etc, but those solutions have a bad code smell to me.

Another suggestion made was to use a validate function within each class. On the face of it, this solution seems pretty good. We can use this function externally to validate the data and display errors to the user and we can use it internally to make sure that the operation has all of the data that it requires. This solution does not hold up well under scrutiny. As a colleague pointed out, the validation function would need to know which operation it was validating, leading to various switches and control structures within the validation, more maintenance and ultimately bloat. You can also end up with duplication of validation if you do more than one operation that uses the same data. Each operation would have to validate its data; data that may have already been validated previously. Sure, you could mark that data as safe internally, but I think that by that point that you are solving problems that you should really be having. Additionally, I believe that classes should be working with data, not on data. So that wasn’t going to work either.

With some further thought, I decided that really, Exceptions shouldn’t be used at all for errors that end users need to be informed about. Data from users should transformed, through filtering, sanitizing and validation, into the second category of data, before asking the object to perform any operations. If something is then missing when the operation is called, it should throw an Exception. If the data is then missing at that point then something serious has gone wrong and the program should not proceed any further.

So there you go, in my personal opinion, objects should not be validating data submissions. They obviously do need to check that they have the data they require for the operation being performed, but that should throw an Exception on an error condition to prevent any further processing occurring. Web sites have an advantage in that they are not continuously running programs – each request and submission of data is a discrete operation. This provides you with a good point to filter, sanitize and validate your data, at the very start of your scripts. Use that advantage and check your data at the earliest opportunity.

Tags: , , , , ,

One Response to “Where Should Data Be Validated In Objects?”

  1. Hi!

    I disagree; I think objects should validate data. My favourite ORM, Propel, does an excellent job of this…

    Namely, you can define validation rules on a per field basis (e.g. minlength, maxlength, regexp matches etc) and then you can optionally call $user->validate() before $user->save().

    If you don’t call $user->validate() then your data will probably get saved (assuming there are no database constraints on fields which will result in a db exception being raised

    When calling validate(), you can optionally ask it to validate only certain fields – allowing you to partially populate an object (e.g. on a multi-page form).

    Finally, when validation fails, it doesn’t return an exception – validate() returns true/false; if it returns false, it’s up to you (the programmer) to call getValidationFailures() which returns a list of objects you can iterate over – each has teh field name and the validation failure message you specified in the schema.xml file.

    Anyway, I like it – I’ve not yet looked at Doctrine in any depth, but don’t think it offers the same functionality (but I may be wrong).

    David

Leave a Reply

(required)

(required)