I see code like this all the time (/me glares in the general direction of Zend). It's completely idiotic.
class SomeDataMunger { protected $thingToMunge; protected $mungeResult; public function setThingToMunge( $thing ) { $this->thingToMunge = $thing; } public function getMungeResult() { return $this->mungeResult; } public function munge() { ... some code that operates on thingToMunge ... ... and sets mungeResult ... } }
Anyone who writes functions this way has negative value as a programmer. So if that's you, stop doing it.
Why is it idiotic? Because you could've accomplished the same thing more easily with a single function:
class SomeDataMunger { // (In cases where there are infrastructure dependencies, this // wouldn't be static, but the same principle applies.) public static function munge( $thingToMunge ) { ... do some calculations ... return ... final calculation ...; } }
Which is more concise, easier to understand, less error-prone, and probably a little bit faster to execute.
munge()
, not realizing that 'do something else'
included setting the thing to munge to some other value. Or they
might forget to munge()
at all and get unexpected values
when they call getMungeResult()
.This doesn't mean you should never use instance variables.
Instance variables are great for holding onto things that
are logical properties of an object. For instance, if you're
modeling a person, he might have a hair color. The person might have
the ability to pet an arbitrary dog. It would be reasonable to have
the hair color be described by an instance variable. It
would not be reasonable for the dog being petted to be
referenced by an instance variable if its sole purpose is to influence
the behavior of Person#pet()
(if the person has
its own dog, said dog may have an instance variable devoted
to it, but you would probably still want the Person#pet( $dog )
method to accept an arbitrary dog, since the person's ability to pet
is orthogonal to which dog they call theirs).
A more realistic (and common) example of reasonable use of instance variables is in infrastructure classes. If you have a data access object that uses a database and a cache, handles to the database and the cache will probably be instance variables that you initialize during construction or shortly thereafter and henceforth never change. This is different from the 'munger' example earlier because code that uses (as opposed to the code that sets up) the data access object is oblivious to its internal state and only needs to use a few well-known data access methods.
In short, internal object state is fine and necessary. But if users of your object have to be aware of it, you're doing it wrong.
One exception to this rule is for objects whose sole purpose is to run a single calculation with many inputs and/or outputs. Due to language limitations an object like this can be easier to deal with than a function with many inputs and outputs, especially if most of your inputs have reasonable defaults that most users of the object will not care about. In this case, keep in mind that your object is acting as a function with a complex interface, and you should clearly specify in the class's documentation block the order in which methods must be called to properly use it, and that instances of the calculation object should not be re-used (or, if they can be re-used, what steps need to be taken to ensure that previous configurations do not leak into later ones).
To better separate configuration from the calculation and enforce proper usage, you might consider having the calculation be a function that accepts as arguments and/or returns data objects that encapsulate all the arguments and return values. This approach has an obvious advantage in cases where there are multiple possible ways to do the calculation in that you can further separate 'what to do' (properties the input object) from 'how to do it' (properties of the calculator). A good example of this would be an HTTP request handler function that takes a Request object and returns a Response object. Code to implement such a function is far easier to follow than for that which, say, has all the properties of a request and response and does the processing to convert one to the other (which happens to be the approach taken by every non-homegrown PHP MVC framework I've ever had to deal with, and is one of many reason that they are all awful. The Java Servlets API is much more sensible in this regard (but don't take that to mean I love everything about the servlets API)).
For repeated calculations that require complex heap-allocated structures, you might have a calculator object that re-uses those structures. For instance, a graphics library might persist a stack of matrix objects to avoid the overhead of allocating them each time they are needed. Even in this case, you would probably hide the existence of the internal state by only exposing a single method that:
If you're writing PHP (which I would recommend everyone avoid when possible), the language runtime itself is such a bottleneck that you're probably better off performance-wise doing things the straightforward, stateless (and not re-using calculation objects) way. You're always better off maintainability-wise.