The devs are all writing automated tests and some are even experimenting with TDD. Congrats! But what happens when the build server breaks? Who’s taking care that Continuous Integration is running smoothly? Seems to be an awful lot of red in there…
Unlike writing the first basic tests, CI is hard. Did the test fail due to an application bug or is it the environment? Once again, the dreaded chant of “it works locally” is taken up. What most people fail to understand is that the failing test is the first sign of a communication breakdown between developers and sysadmins.
Maybe the dev locally installed the latest version of python because it brings a 25% performance boost to execution. Or the sysadmin upgraded the
virtualenv package on test due to a glaring security hole. And never rule out that both these guys made some “quick fixes”! Whatever the cause, your business application doesn’t work anymore. Who’s “fault” is it? And, more importantly, who should fix it?
This situation is the whole point of DevOps. Or, rather, the idea behind DevOps is to prevent these situations from happening in the first place.
An ounce of configuration is worth a pound of troubleshooting
- get the build server under control (puppet, chef)
- get the dev boxes under control (vagrant => puppet, chef)
- failing test on CI means “stop the line” – no more commits, builds from *anybody* until the test passes
By “under control”, the team agrees on a set of tools, applications and version numbers that need to be available on the server. These tools are defined in an automated configuration management tool running on all development, build and test infrastructure. Initially, only the sysadmins have write access to the repository that hosts this tool.
How to handle problems the DevOps way
- Never blame!
- Work together to solve problems (helps with situational awareness)
- Share daily stand-ups and tell folks what you’re doing!
- Trust, Measure and Publish
Don’t fall into the trap of blaming and arguing – this isn’t a pissing contest! Act like professional adults interested in the success of your company. The other party did what they did for a reason – take interest and find out. Try to work together to explain and understand why the change was made. If at all possible, combine daily stand-ups and tell people you’re going to be patching the test servers.
Avoid these common mistakes
- don’t give devs write access to the automated configuration management – they don’t have time for sysconfig
- don’t give devs sudo access – when you have a hammer, everything is a nail!
- scale down resources for dev boxes – their desktop VMs typically don’t have 16GB of RAM
Ultimately, when development is responsible for testing and sysadmins are responsible for the environment, there is little chance of misconfigurations and misunderstandings. Having clear lines of responsibility empowers a team to focus on what really matters (and what they’re truly competent in). Communicate this rationale to both sides and get these folks pulling together for the business not their individual departments.