You’ve been given the basic tools for this already. Change management is tremendously advantageous in the release process; without it your success rate for releases will be drastically lower and you’ll undoubtedly have more unplanned work (“firefighting”) in production. Configuration management dictates the quality of your builds and software library which will directly impact the resolution process (remember, rebuilding always beats troubleshooting!). The final phase discussed in “Visible Ops” is continuous improvement and it’s basically about measuring and acting upon critical metrics gathered from your newly implemented change and configuration management processes. You cannot manage what you cannot measure, and I’m not talking about grepping through a log file here! You need to methodically plan which metrics will give you the information you need to improve upon your processes and ultimately deliver more business value per server than your competitors!
So what exactly should you measure? Again, the folks at the IT Process Institute give you a great head start. How effectively do you:
- Release: generate and provision infrastructure?
- Controls: make good change decisions that keep production infrastructure available, predictable and secure?
- Resolution: diagnose and resolve issues when things go wrong?
Carefully analyzing and presenting these metrics should easily give you the mandate to:
- Time to provision known good builds: how long does it take to build and provision infrastructure from bare-metal?
- Number of turns to a known good build: how many times must the build be modified before its acceptable for deployment
- Shelf life of builds: how long will the build be in production until its replaced?
- Percent of systems that match known good builds: how many production systems can claim this?
- Percent of builds that have security sign-off: how serious is your organization about security?
- Number of fast-tracked builds: how many builds were rushed into production via the emergency change process?
- Ratio of release engineers to sysadmins: are you too busy doing stuff instead of thinking about it first?
- # of actual changes per week : how many were authorized?
- # successful changes: what’s the ratio of “emergencies” to “special” to “business as usual”?
- # of service-affecting outages
- # of hours spent on change management
- # of changes submitted vs. changes actually reviewed
- MTTR (Mean Time To Repair): average time to restore service after any interruptions
- MTBF (Mean Time Between Failures): the average time between service incidents
These numbers alone, however, won’t convince your boss to turn the company upside down. You’ll have to convince him through meaningful presentations demonstrating a clear business value in your revolutionary ideas. If this aspect of “visible operations” gives you any pause, then have a look at Matthias’s gentle guide.
I’ve certainly given you plenty of food for thought this past month and really can’t give enough praise to this 100 page volume of operational wisdom. If you’re looking for a way to get your company out of IT chaos, “The Visible Ops Handbook” will start you down the right path!