Visible Ops: Rolling Out Change Management

by on August 28, 2008 · 6 comments

Last week, I introduced you to “The Visible Ops Handbook” and their 4 Agile Steps to ITIL Compliance. While there is no silver bullet for your particular problems, these steps should serve as a good starting point. Today, I’d like to go into a bit more detail regarding what the Visible Ops folks call “Phase One” of getting a grip on your operations environment. Remember, this first step can take many months to roll out so if you want to do this in an agile fashion, start small and gain forward momentum (and the rest of your team’s confidence) with early, measurable successes.

Stabilize The Patient

I’m sure you know exactly which applications always seem to give you problems whether its routine maintenance or *shudder* a big upgrade. Whether its because of fragile architecture or “too many cooks in the kitchen” these applications probably account for the majority of your team’s effort and probably a lot of your site’s downtime. Logically, it follows that these should be the systems you lock down first. Who has access and why? Are they really responsible for the system or is their access just historical convenience? Pick a few of the most troublesome servers to start off your ITIL initiative with the following steps in mind:

  1.  Reduce or Eliminate Access
    •     grant access only to those directly responsible for running the system in production!
    •     if more than 20% of your developers can commit directly to the release branch, introduce a Committer Role!
  2.  Document the New Change Policy
    •     establish clear rules and processes about how changes shall be made and documented
    •     be prepared to deal with those who break the new rules!
  3.  Notify Stakeholders
    •     ensure you have the understanding and approval from business
    •     you’ll be having fewer releases in the short-term so clearly communicate your goals to everyone
  4.  Create Change Windows
    •     regularly scheduled changes create nice “Release Trains” that everyone in the company knows about
    •     “I have to get this feature signed off by QA this afternoon if I want to make tomorrow’s release!”
  5.  Reinforce the Process
    •     verify that every change is made by authorized personnel and documented accordingly
    •     be prepared to deal with those who break the new rules!

Electrify The Fence

You need to be able to detect any unauthorized changes made to the “critical patients” you’ve identified above. There are a lot of excellent configuration management tools on the market (check out my introductions to the open source CM tools Cfengine and Puppet) and spending a day or two installing and configuring these will save you a lot of headaches and wasted work in the future. These tools can automatically discover undocumented changes made to your systems, notify the relevant change managers of the problem, and even revert the system to its previously known (and hopefully correctly functioning) state. To help you mitigate these incidents, here are some simple questions proposed by the IT Process Institute in their “Visible Ops Handbook”:

  •  Who made the change?
  •  What did they change?
  •  Should it be backed out? If so, then how?
  •  How do we prevent it from happening again in the future?
  • Modify First Response

    Even with all the above processes in place problems will still arise. Once you’ve been alerted to a production problem don’t immediately login to the servers and start poking around. This is a knee-jerk reaction that will often cause you to simply restart the service in a fit of blind panic. Instead, go to the change logs created above and try to figure out what happened from a more analytical perspective. The insights gained from spending these extra ten minutes are well worth the downtime. If you would have just restarted those servers within 2 minutes there’s a good chance nobody else even noticed the problem; while this may sound perfect, if there’s no problem how can you justify spending the time and money fixing the root cause? Remember, if this happened at 2am (and your site is accessed globally) this outage would probably be measured in hours. Ensure you’ll have all the information needed to diagnose the problem before you restart the application(s)/server(s)!

    5 Reasons for Change Management Failures

    Congratulations! You’ve just implemented the foundations of a professional change management environment. If you’ve made it all the way here, you’ve probably ran across some of the following (unfortunately, all too common) excuses why this can’t possible work in your organization:

  •  ”We can’t – we won’t be able to get anything done.”
  •  ”The business doesn’t pay us to not make changes.”
  •  ”We don’t need to – we trust our own people.”
  •  ”Our people are professionals and don’t need constant micromanagement.”
  •  ”We already have a policy and we believe there are no unauthorized changes.”
  • Don’t be discouraged! Change is hard (and change management may seem down right impossible at times) and convincing people takes a lot of effort. No matter what the naysayers are complaining about, you are doing the right thing and time will validate your decisions. In fact, my experience is that within another month or two, folks will wonder how they ever managed the old way! And between you and me, we both know they weren’t really managing at all.

    Check in next week for second step to getting your company ITIL compliant!

    Did you enjoy this article? Get new articles for free by email:


    Comments

    1. says

      Thanks, Stephan. Yes, we did introduce these steps successfully. For me, one of the most sensitive points was to revoke root access from trusted colleagues. Taking privileges away has a high risk of hurting peoples feelings. It is absolutely necessary that they truly understand the reasons: It’s no demotion but a necessary specialization for everyone. The guys really taking care of operations become more accountable for their actions (as only they have access now) and everyone who used to have access can now fully concentrate on their main responsibility.

    2. Anonymous says

      Most people wouldn’t want to have root access anyway. Just a hell of responsibility to carry around all the time. No thanks!

    3. says

      I fully agree with you. Some people don’t see the responsibility but they feel powerful as root. Those are the people you want least to have root access but it might be hardest to take it away from them.

    4. The Stig says

      Just a thought : it might be easier to change the process, if you don’t actually tell the people that you’re doing so. Just introduce your changes gradually. Don’t throw any BUZZ words at people, like Agile or ITIL or anything. Folks tend to oppose such things.

      Just my 2 cent

    5. says

      Yes, if you try to avoid a big “stir-up”, your approach of “under the radar” changes is the way to go. It depends on the current situation in your team. Sometimes it is really best to avoid alerting the “nay-sayers” and just do it.

      Sometimes your team might need a “wake-up-call” and a new, bright target to unify the direction everyone is running to. If your team drowns in boring and tedious day-to-day issues, it might give them a boost in motivation to challenge them to become Agile or introduce ITIL or whatever.

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>