How many times have you seen someone just about to, or actually, make what they thought was going to be a “quick change” on a live IT system without enough evidence that it was going to work? Or worse still, a big change?
I’ve seen this many times over the years. Very often it’s a junior developer who has been stopped in the nick of time. Sometimes it’s someone senior who hasn’t paused to think about what might go wrong. Usually the person is under time pressure, has a lot on their plate, and wants to get an action ticked off their list.
Sometimes it works, and they get away with it. However, the law of unintended consequences is a powerful one in the world of software, and something unexpected often happens; and that’s not usually a good thing. At best, the fix can be “undone” from the system; at worse the fix causes all kinds of knock-on problems that gives the maintenance team (and possibly the operations teams) a big, thorny problem.
It’s likely that you’ve got this far and you’re thinking that, in this scenario, some testing would be a good idea. That is definitely true, but once you’ve prepared and tested your change, how do you progress safely from fixing a bug on a development system to deploying it live, with a high level of confidence that it’s going to work and not have any nasty side effects?
Here’s my three-tiered method for doing so which considerably reduces the risk.
The three tiers
To give you the best chance of avoiding that sinking feeling when you deploy a change and it does something unexpected and bad, I’d recommend having three different systems available to you – the three tiers. These are described in detail below, but I call them “development”, “staging” and “live”. By “change” I mean anything from a one-line bug fix through to a substantial functional change.
As you progress your change, you’ll do some testing on the development environment. When you’re happy with that, you need to move it to the staging system to repeat the change. Finally, once you’re comfortable with the way it works on the staging system, you can make it live.
A development system
The development system is a largely uncontrolled development environment. It might be your own laptop or PC. There’s probably quite a few development systems in your organisation – one or more per developer.
If you have any choices in setting it up, you might want to make it so that it’s as close to the live system as possible. E.g. you may have a choice of operating systems – if the live system uses Linux, maybe have Linux on your development environment. However, the most important thing about your system is that it’s convenient for you, and lets you work in the most productive way possible. You want to get your fix or functional change done quickly then you will need to code, compile and test repeatedly with quick iterations – so make sure your personal set up allows that.
A staging system
This is the stepping stone between development and live; it forms a bridge between those two systems. It’s the clever part of the three-tiered system concept.
A staging system is characterised by the fact that it’s not a live system in that it doesn’t have real users or is running a real application; but, very importantly, it’s as close to the live system as possible. This means that it’s ideally:
- Running on the same type of hardware, with the same operating system, as the live system
- It has similar interfaces to the live system – or these are simulated
- It’s loaded with a similar dataset to live. The dataset could be recovered from the live system - at least in part - and then modified in some way so it doesn’t inadvertently interact with the live system
- It might have a simulated load – e.g. a simulated number of users
The idea is that you put the new software, modified with your change, on the staging system, and run some tests. How you generate the tests is a subject for a different article, but you should exercise your change and check for unwanted side-effects.
You might ask: “why don’t I just develop my changes on the staging system”? The answer to that is two-fold:
- The staging system is probably a valuable resource inside your organisation. It probably took a decent amount of someone’s time to setup, and if you break it and need to have it returned to a known, good state then that will probably cost some more of someone’s time. You therefore want to have a reasonable level of confidence in your changes before deploying to the staging system
- Access to the staging system might be restricted, or it might be inconvenient to repeatedly deploy your software to staging, or staging might be shared between you and others in your team.
Hence why I recommend having a development environment.
The benefits of using a staging system should be obvious. You should be able to tease out any problems that weren’t visible on your development system, but when they’re teased out, those problems don’t cause any issues in the real, live world. And you’ve got a good chance of finding all the problems you’d get on the real system because the staging environment is as close as practically possible to the live environment.
The live system
This is the one that does the work – the public facing website, or the system that crunches the real data. There’s not much more to say! You will intuitively know what your live system is.
Assuming that if you don’t go for this approach, you still have to have a development and a live system. So the costs associated with this approach are related to having a staging system.
Obviously there’s a cost to setup the staging system in the first place, which might be considerable. There’s probably some hardware costs, and you might have to licence some COTS software. There’s also the professional time involved in setting up the hardware & operating system, and getting your application loaded and configured on the staging system.
There’s also an ongoing cost in making sure the staging environment closely mirrors the live system (e.g. when a vendor releases COTS patches, these have to be applied to both staging and live). Periodically, you may have to manually reset the staging system to a known, good state, e.g. if your testing breaks the staging system. There’s almost certainly an ongoing cost in keeping the system running – there’s power costs but also the cost of renewing components as they break or become obsolete.
There’s also the increased time you’ll incur in testing your change.
Despite the costs outlined above, the advantages will outweigh the downside, for a certain level of criticality of system. What I mean by that is: if the live system in question is critical to your organisation’s operations, the cost of it being unavailable, even for a short time, might outweigh the ongoing cost of having a staging system. There’s also your organisation’s reputation to think about – having unplanned downtime might be unpalatable. The advantages therefore are:
- You can check your changes are likely to work ok when they are made live, without actually putting them live.
- You can check your changes are unlikely to result in unexpected side-effects
- You can develop in parallel (e.g. with multiple developers) but can bundle together a release and test that release.
- If you’ve invested time in developing your staging system and can also rebuild it quickly, this experience will help if there’s ever a serious problem on the live system and it has to be rebuilt quickly.
By using a three-tiered system of working to prove your changes, with a staging system bridging the gap between the development environment and the live system, you’ll increase confidence in your changes and reduce the risk of something going wrong when you go live.
Neil Tubman is a director of Terzo Digital and has 17 years experience of working on mission critical systems.