Always Learning: 09/07/13

Almost every web application needs to handle global data. There are certain things that just have to be available throughout the entire code base, such as database connections, configuration settings, and error handling routines. As a PHP developer, you may have heard the mantra 'globals are evil', but this naturally begs the question 'what should I use instead of global variables?'

There are several different strategies available to help cope with the demand for global data - each has its advantages and disadvantages and it can be a challenge to know which approach to use for any given situation. Here I will try to outline what the options are, how they work, their advantages, their disadvantages, and examples of under what circumstances each option might be used. The code samples are not necessarily realistic, they are kept as simple as possible to demonstrate the idea. The principles and design patterns that follow are not specific to PHP - they can be applied to any object oriented language.

Global Variables

Probably the easiest solution to understand and to implement is the use of global variables. A global variable is defined using the global keyword, and that makes the contents of the variable available throughout the entire code base. Typically you declare the variable (again using the global keyword) in every scope that you want to use it, although you can just use the $GLOBALS 'super global' array to reference the value without declaring it first.

Global variable example:

global $database;
$database = new Database();

function doSomething()
{
    global $database; //This has to be declared so that PHP knows 
                      //you want to use the global variable, not 
                      //a local one
    $data = $database->readStuff();

}

Global variable example using $GLOBALS super global:

global $database;
$database = new Database();

function doSomething()
{
    $data = $GLOBALS['database']->readStuff(); //No need to 
                                               //declare the 
                                               //global first

}

Global variable advantages

The main advantage of global variables is that they are easy to understand, and easy to use in your code. Their ease of use makes it tempting to use them a lot, even though there may be much better options available!

Global variable disadvantages

The disadvantages of using global variables nearly always outweigh the advantages.

They make your code hard to read and hard to understand. It is not obvious what the variable is for, where or how it was initialised, or what is the proper way to use it.
They make your code hard to maintain. It is very difficult to make changes to global variables, as you have to search through your entire code base looking for where they have been used.
It is easy to abuse a global variable and cause errors that are hard to debug. Without any control mechanism for how the variable is used, it is easy to populate it with invalid data which can cause errors in other parts of the code (for example, if one part of the code populates the variable with an array but another part of the code expects it to contain an object).
It is easy to get confused regarding the variable's scope. If you forget to declare that the variable is global, you can end up unwittingly working with a local variable without noticing - until your app breaks. This can also be hard to debug.
If you combine your code with someone else's code (eg. by using a third party library or writing an extension for another piece of software), and both systems use global variables, there is a chance that the variable names could clash, causing errors in both systems which are hard to debug.
All parts of your code that use a global variable are tightly coupled and it becomes very difficult to separate out or re-use a module elsewhere.
Unit testing is made more difficult, as the test has to know which global variables are needed, and how to initialise all global variables with valid values.

When to use global variables

It is rarely a good idea to use global variables, especially in a large application, but there are some occasions when their ease of use and simplicity make them an acceptable option. In particular, if you are writing a short and relatively simple plugin or small app, which is going to be easy to read and understand, or perhaps a proof-of-concept or prototype script.

Static Classes (Helper Classes)

Using helper classes that just contain static members is another easy way of dealing with global data, although they share many of the same disadvantages as global variables. Classes with static members can contain both properties and methods, like a normal object, but do not need to be instantiated before use, and retain their values throughout the scope of your application. They have more in common with procedural code than object oriented code, despite the use of classes.

Static class example:

class SmtpConfig
{
    public static $host = 'localhost';
    public static $port = 465;
    public static $user = 'me@example.com';
    public static $password = 'j4a!9Sd@aKP2f';
    public static $tls = true;
}

echo SmtpConfig::$user; //This and other values from the class 
                        //are available everywhere as long as 
                        //the file containing the class 
                        //declaration has been included or can 
                        //be autoloaded

Static class advantages

A static helper class enables you to group several related pieces of data together.
Static classes are easy to to use, easy to understand, and not so prone to naming clashes as global variables (although now that PHP supports namespaces, name clashes are not really an issue except in legacy code).
It is easier to locate where the data was initially defined (most IDEs will automatically locate the class definition for you, whereas with global variables, it is not always possible to tell where they were first declared), although this still doesn't stop the values being initialised or changed anywhere throughout your code base.

Static class disadvantages

If a static class has methods which have their own dependencies, they can be more difficult to unit test than instantiated classes with dependency injection (see below).
The main disadvantage of static classes is that they promote close coupling - any object relying on the static members is closely coupled with the code that initialises those members (and which in turn may have its own dependencies).
Static classes do not have a constructor, so any static methods have to do their own dependency checking, and the calling code has to perform any initialisation beyond just using the default values. This also typically requires error checking after the method call - at which point it is difficult to ascertain which dependency failed.
Dependencies are not enforced, so the code execution can fail to execute with few clues as to why, and for reasons related to a dependency of a dependency of a dependency and not for reasons relating to the place in the code where the failure occurred (a debugging nightmare).

When to use static classes

Static classes are best used in simple cases where there are no dependencies, or where the dependencies are simple and fundamental enough to the operation of your application that they can be taken as read (since you have no way of enforcing them). An example of this would be your application's global error handler (although that could equally well be a singleton - see below - it is best to keep error handling as simple as possible, as it needs to be bulletproof, and static classes are arguably simpler than singletons - you could even use procedural code in a bootstrap file which is simpler still).

Static members are useful as private or protected members of an instantiated class, and provide a way of storing data once for many instances (thus reducing the amount of memory needed for each instance) - for example by holding immutable metadata such as database column information. They can also be used effectively for providing small algorithms that are never likely to need changing or overriding for internal use within an instantiated object. But a class full of just static members is a bit of a 'code smell', and there is usually a better way.

Singleton

A singleton is a class which is instantiated, but for which there can only be a single instance. A singleton class cannot be directly instantiated by the calling code (the constructor carries the private or protected modifier) - it has to be accessed through a static member which checks whether the class has already been instantiated, and if so, returns the existing instance, otherwise, creates a new one (which it holds on to in case another caller wants it). This is an effort to allow better support for inheritance, and to allow it to be passed around and treated like any other object. The intention of the singleton pattern is not really to provide a mechansim for global data, but to ensure that only one object is created (it being globally available is a side effect).

Singleton example

class Singleton
{
    protected static $instance = null;

    protected function __construct()
    {
    }

    protected function __clone()
    {
    }

    public static function getInstance()
    {
        if (!isset(static::$instance))
        {
            self::$instance = new Singleton();
        }
        return static::$instance;
    }
}

$singleton = Singleton::getInstance();

Singleton advantages

Ensures there is only one version of the object (allowing a resource to be shared).
Can be used from anywhere in the code - if it is not already instantiated, it will be on its first use.
Can support inheritance and polymorphism to a limited degree.

Singleton disadvantages

Enforcing a single instance of an object is rarely the desired behaviour (for example, whilst it might seem like you would only need one database connection in an application, requirements might change, requiring the application to access more than one database - perhaps for backup or synchronisation purposes).
There is no need for classes that rely on the singleton to declare their dependency on it, so it is not obvious that they rely on it and it creates a close coupling between them.
Inheritance and polymorphism are restricted, as there can still only be a single instance per request (but the implementation could be different for different types of request).
Once instantiated, the singleton will be held in memory for the life of the request even if it is not needed again (this might be desirable for objects that are expensive to instantiate and/or that are used frequently, but can negatively impact memory usage if used indiscriminately).

When to use a singleton

A singleton should only really be used if a single resource needs to be shared among different objects. It is necessary to check that even if the current requirements do not call for multiple instances of the object, any likely or potential future requirements will also not need to allow for multiple instances. A common use of the singleton pattern is for combining with a global registry pattern (see below), or for interacting with the operating system or host that the application is running on (of which there will only ever be one at a time).

Registry

The registry design pattern allows you to define an object (usually a singleton) which holds references to various other resources (typically as key/value pairs) that may be needed by your application (for example, database connections and configuration settings). Although the registry itself is usually a singleton (as you only want a single registry available to the whole application), the resources it stores are not expected to be singletons - it can store several different instances of the same class. The resources it stores do not even have to be objects - they can be primitive data types or arrays.

Resources can be stored in a hash table (array), or if you know that certain items will need to be in the registry, you can strongly type them (which will help with the code autocomplete features of your IDE). You could also have a mixture!

Registry example (weakly typed)

class Registry
{
    protected static $instance;
    protected $resources = array();

    protected function __construct()
    {
    }

    protected function __clone()
    {
    }

    public static function getInstance()
    {
        if (!isset(self::$instance)) {
            self::$instance = new Registry();
        }
        return self::$instance;
    }

    public function setResource($key, $value, $force_refresh = false)
    {
        if (!$force_refresh && isset($this->resources[$key])) {
            throw new RuntimeException('Resource ' . $key . ' has already been set. If you really ' 
                                       . 'need to replace the existing resource, set the $force_refresh '
                                       . 'flag to true.');
        }
        else {
            $this->resources[$key] = $value;
        }
    }

    public function getResource($key)
    {
        if (isset($this->resources[$key])) {
            return $this->resources[$key];
        }
        throw new RuntimeException ('Resource ' . $key . ' not found in the registry');
    }
}

//Add a resource to the registry
$db = new Database();
Registry::getInstance()->setResource('Database', $db);

//Retrieve a resource from the registry (elsewhere in the code)
$db = Registry::getInstance()->getResource('Database');

Registry example (strongly typed)

class Registry
{
    protected static $instance;
    protected $main_db;
    protected $sync_db;
    protected $config;

    protected function __construct()
    {
    }

    protected function __clone()
    {
    }

    public static function getInstance()
    {
        if (!isset(self::$instance)) {
            self::$instance = new Registry();
        }
        return self::$instance;
    }

    public function setMainDatabase(Database $value, $force_refresh = false)
    {
        if (!$force_refresh && isset($this->main_db)) {
            throw new RuntimeException('Main database has already been set. If you really '
                                       . 'need to replace the existing database, set the '
                                       . '$force_refresh flag to true.');
        }
        else {
            $this->main_db = $value;
        }
    }

    public function getMainDatabase()
    {
        if (isset($this->main_db)) {
            return $this->main_db;
        }
        throw new RuntimeException ('Main database resource not found in the registry');
    }

    public function setSyncDatabase(Database $value, $force_refresh = false)
    {
        if (!$force_refresh && isset($this->sync_db)) {
            throw new RuntimeException('Synchronisation database has already been set. If you really '
                                       . 'need to replace the existing database, set the $force_refresh '
                                       . 'flag to true.');
        }
        else {
            $this->sync_db = $value;
        }
    }

    public function getSyncDatabase()
    {
        if (isset($this->sync_db)) {
            return $this->sync_db;
        }
        throw new RuntimeException ('Synchronisation database resource not found in the registry');
    }

    public function setConfig(Config $value, $force_refresh = false)
    {
        if (!$force_refresh && isset($this->config)) {
            throw new RuntimeException('Configuration object has already been set. If you really '
                                       . 'need to replace the existing configuration, set the '
                                       . '$force_refresh flag to true.');
        }
        else {
            $this->config = $value;
        }
    }

    public function getConfig()
    {
        if (isset($this->config)) {
            return $this->config;
        }
        throw new RuntimeException ('Configuration resource not found in the registry');
    }
}

//Add a resource to the registry
$db = new Database();
Registry::getInstance()->setMainDatabase($db);

//Retrieve a resource from the registry (elsewhere in the code)
$db = Registry::getInstance()->getMainDatabase();

In these examples, the developer is allowed to overwrite existing resources, but only if they make it clear that this was their intention (by setting the $force_refresh flag).

Registry advantages

If there are common dependencies that are used throughout your code, you can use the global registry instead of passing an individual parameter for each one.
A registry allows you the freedom to store and manage your global data centrally, without restricting the implementation to a single instance, and allowing full use of inheritance and polymorphism for the resources it manages.
A strongly typed registry allows your IDE to help you avoid typing mistakes.
A registry is somewhat easier to use than dependency injection.

Registry disadvantages

A registry still hides dependencies and is tightly coupled to objects that rely on it (or its contents), although not as tightly as a global variable (because the resources can be replaced with different sub classes).

When to use a registry

Some developers reject the use of a registry on the grounds that it is just a global array in disguise, and (in particular with a weakly typed implementation) gives no clue as to how its contents are meant to be used. However, the generally preferred alternative (dependency injection - see below) can get out of hand when you have to inject lots of dependencies, many of which are the same ones over and over again (you can use a dependency injection container to manage this, but it is arguably more complex than using a global registry). Used sparingly then, a registry can be an appropriate vehicle for managing the most common dependencies that are fundamental to the workings of your application (typically, one or more databases, maybe a logger, and a configuration object), without requiring an unreasonably long list of parameters, or repeated use of the same parameters, for every object instantiation.

Dependency Injection

Dependency injection requires that the calling code supply all of the dependencies to an object before use. In most cases, this is done by passing parameters to the constructor - so that the object cannot be instantiated unless it has been given all of the data it needs to do its job. Optional dependencies are often injected using a separate method call. Injected dependencies are often objects but they don't have to be - any data the object requires to do its job is a dependency and must be supplied by the calling code.

Dependency injection example

class Person
{
    protected $database;
    public $title;
    public $first_name;
    public $last_name;
    
    public function __construct(Database $db, $last_name)
    {
        $this->database = $db;
        if (strlen($last_name) == 0) {
            throw new Exception('Last name required');
        }
        $this->last_name = $last_name;
    }
    
    public function setTitle($title)
    {
        $this->title = $title;
    }

    public function setFirstName($first_name)
    {
        $this->first_name = $first_name;
    }
    
    //More methods here which use the database object
}

$db = new Database('localhost', 'user', 'password');
//The database object is passed to the Person object in the constructor, along with some other data
$person = new Person($db, 'Smith');
$person->setTitle('Mr'); //Optional dependencies can be set with a separate method call

Dependency injection advantages

Dependency injection de-couples your code, allowing each object to exist and perform operations without requiring any particular environmental setup.
This makes code re-use much easier, as you can just use the same object in another application or in another setting in the same application.
It also makes unit testing much easier, as a test can be set up to inject real or dummy dependencies for the purposes of testing the object.
It is obvious to the calling code what the dependencies are - it can therefore supply everything that is needed without worrying that there might be some hidden dependency which will break the application if not supplied.
Inheritance and polymorphism can be used to great effect by specifying the parent class (or interface) as a dependency - the calling code can then supply any sub class and the object doesn't need to know or care what the implementation is (allowing for easy extensibility). For example, if a class has a constructor which requires a database object to be injected, the calling code can inject a MySQL database class or an SQLite database, or any other sub class of database (perhaps even one that hasn't been invented yet).
By passing dependencies in the constructor, any problems can be caught early - the class can verify that it has valid dependency data before it will allow instantiation. This makes debugging much easier.
For a lucid explanation of why dependency injection is generally superior to using statics, please see David C. Zentgraf's post: How Not to Kill Your Testability Using Statics (the article is not just about testability).

Dependency injection disadvantages

The calling code may have more work to do to initialise an object, especially if the dependencies you are injecting have dependencies of their own (if this gets out of hand you could look into using a dependency injection container).
If there are lots of dependencies, you could end up with a long list of parameters in your constructor which makes the code difficult to read and understand.

When to use dependency injection

In most cases, dependency injection provides more advantages than disadvantages, so it is becoming common practice to use it by default and only avoid it if it is causing problems. Where certain objects are used extensively throughout the code base (such as a database or configuration object), injecting them into every object can become laborious and inelegant. In such cases, it might be better to just accept a certain amount of close-coupling for the sake of code readability (and writability!), and to use the setup and teardown features of your unit testing software to initialise and destroy the most common dependencies.

Increasingly though, dependency injection containers are used to handle multiple object dependencies. Using a container allows the dependencies to be defined just once instead of at every instantiation, and the dependencies can even be defined in a config file or in annotations rather than in the code itself. There are various frameworks available that provide dependency injection containers, some of which are very lightweight and specialise in just providing containers.

In conclusion

The developer has to make a judgement call about when to use which approach for handling global data. Each approach has its advantages and disadvantages, and whilst some (dependency injection) are clearly more desirable than others (global variables) in most situations, it is not helpful to make blanket rules (like 'singletons are evil').

Always Learning

Blog Archive

Saturday, 7 September 2013

Handling Global Data in PHP Web Applications