This is a guest post by John Allspaw (@allspaw)
It’s been said by a number of smart people that DevOps is largely founded in an organization’s skillful collaboration and communication, and the culture that results. I agree with that idea, and I also think that it’s one of the reasons why the term DevOps is sometimes difficult to explain, because these are ‘soft’ skills we’re talking about. These aren’t things you can graph or alert on, they only manifest in the resulting product and environment.
Good collaboration and communication results in progress and a healthy environment to stretch technical muscles and evolve a product as fast as it needs to be.
Bad collaboration and communication results in missed business opportunities, stressed or burnt-out staff, and high turnover.
So, what do we actually mean when we talk about these things being core to DevOps? Communication? We have email, IRC, IM, and version control commit messages. So we communicate, right? Collaborate about what? We make a website. What else is there?
Collaboration: Surrounding the Technical, Driven by the Social
Let’s take an example of a technical problem that can cross dev and ops boundaries of responsibilities and perspectives.
Site redirects/rewrites for URLs and other layer 7 bits: should they happen in the application code itself, or in the webserver config, say with mod_rewrite, via .htaccess? If in the webserver, should it be in an included conf.d file, or in the base httpd.conf? You could have a good reason for it to be done at a layer 7 load balancer. Maybe there’s a case to be made for a separate place for it, like an nginx proxy?
I’ve seen very successful companies put them in all of the above places, all valid and appropriate solutions for each of them, and I don’t believe there is a single “right” way for each of those things. Ops might prefer the load-balancer. Devs might prefer the application. Can those two groups have an adult conversation aimed at finding an appropriate solution, void of self-interest and putting instead the interests of the product at the forefront?
Let’s put that aside for a moment and take another example: a new feature launch.
You love your life as a developer of public-facing code. You’ve got a plan for the new, awesome feature. You have a spec that’s reasonably defined, you have design resources, and a product person is there to help you iterate over the plan. You can practically write it in your head.
At this point, mature and skillful developers start to ask themselves questions while they code:
- What could go wrong with this feature?
- What are its failure modes?
- What metrics will tell us something is wrong? What metric thresholds will tell us that? Should someone be woken up? If they do get woken up, what should they do about the alert?
- What metrics will tell us it’s successful?
- How will we handle the failures when they happen?
- Do I trust my resources not to fail (cloud instance, RAID, disk, CPU, etc.) too much in my design?
- Is hot/cold failover appropriate for infrastructure meant to support this new feature? Manual or automatic?
- Am I favoring C, A, or P in storing or serving data?
and, of course, there’s the best one:
How can I have all these concerns appropriately covered and still get the feature out in a reasonable amount of time so that it has value for the business?
All of these questions are opportunities for good communication and collaboration between development and operations. There are lots of places where the code hits the metal, where the behavior meets the alarms, and where the “normal” meets the “contingencies”.
But who’s going to have those above conversations? Ideally, both developers and operations, each coming to the problem with their own domain expertise and perspectives. And ideally, all perspectives will have equal weight in coming up with a solution that has availability, performance, scalability, reliability, and maintainability all in mind.
This is the collaboration we’re talking about.
The best ops guys I know understand the value of getting features out the door in a reasonable timeframe, and want to assist that process in any way they can.
The best developers I know understand what it means to build a feature that is both operable and maintainable in production, and are also open to suggestions on how to do it appropriately.
And the best teams I’ve worked with know the limits of both their traditional perspectives, because they’re nicely talking to (and learning from) each other all the time.
Advice for DevOps-Minded Management: Ways With Words
Just because groups communicate doesn’t mean that they’re good at it. Solutions that come with condescension, handed down from a high horse, or one that is couched in CYA (“cover-your-ass”) language are not solutions indicative of a mature and senior engineer. An engineer who is primarily satisfied with solutions because they isolate and relieve him of blame in times of failure isn’t someone you want on your team.
Throughout all of this evolved/revolutionary/awesome DevOps collaboration lovefest, management needs to take responsibility for cultivating co-operation between the two groups. Allowing traditional stereotypes to continue and fester (devs break things without care, ops are grumpy police, etc.) is outright encouraging them.
This means squashing poor communication habits as they happen from the BOFHs in the organization. This also means making post-launch and post-mortem meetings blameless. This also means keeping a close eye on how people talk about each other’s work. See a lot of “us” and “they” and “we” in conversation along the dev and ops boundaries? You just might have a problem, and it’s your job as manager to look into it and get those pronouns removed from the vocabulary.
Conversely, it’s also a good practice as a manager to look around your organization. Find that engineer (dev or ops) who everyone wants to work with. Everyone wants to work with him because:
- He’s likeable and approachable.
- He has good ideas, so you can bounce yours off him with no fear.
- He’s a guy who gets things done.
We all know that guy. Everyone should be that guy.
Because that guy gets what DevOps is about.
About the author
John is currently VP of Technical Operations at Etsy and the author of two books: “Web Operations” and “The Art of Capacity Planning”.
He’s worked at Flickr, Friendster, InfoWorld, Salon, Genentech, Volpe National Transportation Center, and a bunch of other places as a consultant from time to time.