DevOps: State of the Nation

This is a guest post by Chris Read (@cread)

DevOps is gaining credibility. Industry analysts such as Gartner are talking about it, the big IT vendors are already trying to sell DevOps “solutions.” At its heart it is the integration of Agile principles into Operations practices. With its roots in dysfunctional Enterprises, it has enabled the rise of the Cloud and Web 2.0 giants. As DevOps matures it will help all those in the industry focus on their core business. It will continue to drive the realization of utility computing.

It’s been about a year since we had the first DevOpsDays conference in Belgium. A lot has changed in the past year, especially where DevOps is concerned, so I thought I’d take a moment to reflect on how we got here, where we are, and where I see us going.

How We Got Here

Many people have been pointing to Web 2.0 and Cloud Computing as the roots of DevOps but the culture and philosophy itself has roots that go a lot further back. Gartner even title their report “DevOps: Born in the Cloud and Coming to the Enterprise.” Having a DevOps culture has certainly helped the top web and cloud companies achieve what they have, but it didn’t start there.

It all began in the heady days of the mid ’90s. I was working in my first of several dot-com startups where I would find myself writing code in addition to looking after the infrastructure of the company we were trying to create. In these places there were plenty of people who had some form of the word “developer” in their job title, but none with the word “operations”. There were no full time System Administrators so I would often end up doing Sys Admin type jobs myself. I was not alone – other developers I worked with also thought our success depended on understanding the lower-level technologies we were building upon. We were also keen to learn as much as possible about the ecosystems our creations had to live in: people were interested in learning more, and people were interested in teaching what they knew.

It was only when I ventured away from the world of startups and first tried my hand as a consultant with IBM that I encountered the world of organizational silos. As much as I enjoyed the people I got to work with, it still made no sense to me that if, for example I needed a file restored from tape I had to raise a ticket for the team that sat behind me to do the work when I technically had all the access and skills required to do the job. I also discovered an attitude of “I’m only working in the computer industry because there’s money in computers”. This attitude was an alien concept to me! Ever since I wrote my first BASIC program or dismantled a radio to see how it worked, I’ve dreamed code and circuits. How could people get up in the morning and do a job they were not passionate about?

Over the years I found I was alternating between being a Developer and being a Sys Admin because nowhere could I find a job being both, and I didn’t want to give either up. I’d still find time to talk to those “on the other side,” but I’d have very little influence. I had resigned myself to the idea that I’d spend the rest of my career spending a few years at a time in each role and then switching.

In the meantime, some development folks had been looking at software development as an area where they could provide significant improvement to throughput. Their Agile and Extreme Programming ideas focused on improving communication and feedback between “The Business” and “The Developers” at the front line of getting these ideas implemented – so much so that developers were becoming perceived as part of the business. The focus on fast feedback and closer cooperation between stake holders resulted in better-quality software being produced faster and more reliably.

Traditionally, IT projects have been viewed as big, scary beasts. For years, most organisations found that the journey that started with a business unit having an idea and ended with a deployed software solution to implement that idea was long, risky and painful, and fraught with dangers. From a business perspective, the IT organisation as a whole was the limiting factor, but they accepted this as simply “the way things are.” The Agile folks had managed to alleviate the first part of the problem – writing software based on feedback from the Business – and had exposed the next one, namely getting the software into production.

Dan North, a developer at ThoughtWorks, had experienced this bottleneck time and again on different projects. He was convinced the solution lay in finding people with skills that spanned the development and operations worlds, and managed to convince ThoughtWorks to hire a couple of people who understood both sides of the divide to help the development teams improve their deployment process. Julian Simpson (aka “The Build Doctor“) and myself were the first of these hires, followed not long after by Jez Humble (co-author of Continuous Delivery). Finally I’d found a company where I could be both a Developer and a Sys Admin at the same time. Awesome! The year was 2005, and at the time most people were still trying to get their heads around unit testing and continuous integration.

The results of this project – one of the first enterprise-scale dev/ops projects – are still being felt and expanded on today. It was on this project that we:

  • Built our first “one click” deployment system by getting the client System Administrators to come and pair with us.
  • Built our first “deployment production line” and named it.
  • Learned many of the lessons that formed the foundations of the Continuous Delivery book.
  • Realised that the existing tools for Continuous Integration made setting up new projects hard and so developed Buildix, a custom Linux distribution aimed at simplifying the setup of a new software project. (It’s now defunct as all the leading CI tools are now really easy to set up.)

When we joined the project it took over two days to get a working build into a suitable test environment. By pairing with the client operations people we produced a set of shell scripts that reduced the build and deployment deployment down to a couple of minutes, with a sub-second cutover between versions. Production releases went from being high risk, high visibility endurance sessions with loads of downtime to something they did at lunch time on Fridays. The business loved it, and this feedback reinforced our opinion that we were on to something Big and Important.

Four years later in early 2009, having spent much of the intervening time developing these ideas and speaking at conferences on Continuous Integration and Cloud Computing, I was approached by Patrick Debois, the man who coined the term “DevOps,” to speak at a DevOps-specific event he was putting together. I knew him from CITCON, the Continuous Integration and Testing conference. We were doing similar things, and he’d decided the time was right to have a little conference in Belgium to try and bring like-minded people together. He’d even come up with a name for it – DevOps Days.

The videos of the talks from that conference are available in the DevOpsDays Archives. There were a number of us there who are based in London so we decided to create a London DevOps group to share ideas and discuss challenges. Since then there have been DevOps days conferences in the USA, Germany and Australia, and Brazil is already scheduled. Local groups have sprung up all over the place. The Internet has been abuzz with the terms. DevOps seems to have taken off.

Where Are We Now?

DevOps has reached the stage now where Continuous Integration was about 3 years ago: people have heard of it, some are using it but few are really aware of its potential. However things are moving quickly and picking up momentum. Product companies are trying to hop on the bandwagon. There are adverts for “frameworks” and other allegedly magic bullets that will bring the DevOps joy to your organisation. As mentioned earlier, it’s even on the radar of industry analysts such as Gartner.

However, unlike Continuous Integration, which is a clearly-defined activity, DevOps is at best vaguely-defined and at worst simply a sales placeholder for pitching operations-related products and services. The reason for this is that DevOps is primarily a cultural and organizational shift, rather than a set of practices, tools or techniques.

My working definition is:

DevOps is the integration of Agile principles with Operations practices.

In order to realise the many positive benefits of Agile software delivery, development groups typically experience a period of intense, difficult, cultural change. This involves breaking down deeply-embedded functional silos to create cross-functional teams of developers, testers and business analysts, and investing heavily in process automation. In the same way operations and development teams today should anticipate a similarly disruptive change to their culture if they want to embrace DevOps. DevOps is not a role or a job description. It’s especially not yet another silo in a farm of silos! It is a change of culture and mindset.

As you read through this series you will notice a strong emphasis on cultural change, but you’ll find many different views on how to do it. This is because we’re all still figuring it out for ourselves, and because no two companies are the same.

In most organisations dev and ops are optimized for different things – delivering new software and maintaining operational stability respectively. On the face of it these appear to be in conflict with one another, which results in the kinds of tensions we’ve all seen between the two groups. Sys Admins sometimes joke that developers “throw software over the wall,” and there’s a lot of painful truth to that. The developers lob their software releases over the proverbial wall like a grenade to the sys admins, where it promptly explodes in their faces and threatens their hardwon reputation for operational stability.

However it turns out that (small, frequent, controlled) change is actually very well-aligned with operational stability! If you never touch your servers they rapidly fall out of date, to the extent that you can end up on a platform that is no longer supported by your vendor. The most common approach to core systems maintenance is to regularly apply vendor-supplied patches. In other words, continual small changes serve to improve stability, and help manage operational risk.

By embracing Agile principles, the world of development is no longer the bottleneck to delivery it once was. In pre-Agile times (and in many command-and-control organisations today) it is not uncommon to encounter a two or three year project rumbling along with no visible outputs, only to fail to make it into production at the final hurdle. In other words, the development “black hole” was the biggest impediment to throughput. Now that Agile and XP have become mainstream in the development world, the journey into production has been exposed as the next big delivery bottleneck. It’s the next thing that needs to be improved.

So how can we apply these Agile principles to the operations world to help clear this blockage? Let’s see what we can learn from the development team.

The old Waterfall process of writing software would start with The Business describing their requirements to Analysts who then worked with Architects to create a specification document that was then passed down to The Developers who in turn would offer up their application to The Testers who would keep returning it due to defects being found but would occasionally let a version through to The Operations team who would then try and get this code running in production.

Co-locating the various people involved in delivery demonstrates the Agile principle of collaboration over contract negotiation (where contracts in this case are SLAs between departments, or detailed specifications that can be used in evidence at the post mortem). We embed the analysts and testers into the team with the developers, and thus improve access and communication within the team and with the business stakeholders. The result is better throughput and communication, and everyone involved in the process understanding the value of their own contribution.

The obvious next step in this process would be to integrate the operations folks into the mix. Just as there has been a historic disconnect between Business and Development, there has been a similar one between Development and Operations.

Now consider how we might integrate this principle from an Operations point of view by examining something as commonplace as provisioning a new server. The team that needs it creates a New Server Request which then goes to Purchasing to get the hardware, which then goes to the Unix team to install the OS, who send it to the Data Center team to rack, who raise another ticket with the Network team to get the switch ports enabled, (or resubmit the ticket because no-one thought to check whether the rack could power the server), who then pass the ticket back to the Unix team to test it, who finally (phew!) close the ticket.

Seeing the correlation yet? How much faster could that server be brought up in production if there was just one person who owned the flow from request to handover? The need for this level of speed improvement has been one of the major drivers behind the adoption of cloud services. Not everyone needs a new server in minutes but it’s likely that anything more than a day or two will start having an impact somewhere. Not everyone needs to be an expert in all areas of purchasing, power management and networking either, but one person on the team needs to own it. Pairing helps a lot with this. Where the person who is responsible for the outcome lacks either confidence or experience, pairing with someone who has the relevant skills has the dual outcome of helping get the job done and sharing knowledge within the team. Coupled with more automation, this can dramatically improve the accuracy of what’s deployed.

To encourage shared responsibility of the entire product delivery chain, we need to re-think how we organise the IT department. Exactly how you do it depends on what provides value to your organization combined with the relative skills of your teams. The hardest thing will probably be how to socialise this change into your teams. Both development and operations groups need to learn from each other as they become a cohesive unit. The KPIs used to measure these teams need to promote cooperation and understanding of the common goals. It’s a cultural problem, not a tool problem. (For instance, where historically you may have rewarded a networking specialist for how well they can diagnose packet loss due to an incorrectly-configured switch, now you are also rewarding them for how effectively they are sharing their knowledge with the generalists embedded in the development teams.) There is still scope for specialists – who act as consultants and advisers, help define guidelines and of course get their hands dirty along with everyone else.

These specialists also help create or roll out tools to support the group in safely making their changes. Automation is key to the success of your DevOps adoption, and you’ll find yourself assembling a toolchain that is specific to your environment. It is important to resist the allure of “silver bullet” solutions because they inevitably won’t be. Instead, be prepared to kiss a lot of frogs, and don’t be scared to start with a very small shell script. Your toolchain should help with automating deployment, configuration management and monitoring. The most important requirement of each tool in your chain should be that it is intuitive and easy to use for both developers and operations folk. Before choosing a tool, speak to the users!

Developing your tool chain should be an obvious place to start your DevOps journey as it provides immediate value and creates common ground for the teams to collaborate on.

Where Are We Going?

If you think DevOps is just a passing phase, you are wrong! So much of what we are doing is rooted in plain old common sense. It’s simply the next logical step in the journey to improving our industry.

I think DevOps is the real power behind true utility computing. Leaders in the cloud provisioning space like SliceHost and Amazon Web Services are well known for how they implement the DevOps principles to lead their fields, and how their services help improve the throughput of the provisioning example I used earlier. The principles they use to provide the thousands of servers they have apply just as well to any company who has an internal farm of 50 servers or 500 servers.

This momentum will continue, and become more evident in the companies that continue to make the best use of this new model. Many companies who switched over to utility power in the previous century discovered the true cost of inefficient processes and machinery when it came to their bottom line. This principle still applies in this century to companies who try to use utility computing. If their processes are lean and efficient, they have very little problem tapping into the cloud provider(s) of their choice. If they don’t have the DevOps culture though, they’re in for a big shock when the first bill arrives.

The Web 2.0 and Cloud vendors who are the current poster children of what can be accomplished are simply the frontrunners. They have accomplished what they have as a result of the pace dictated by the markets they participate in and a lack of legacy drag. Organisations in other industries are already starting to change their ways. The speed with which companies need to adapt to demand is increasing all the time. As those that gain traction by adapting quickly pull away from their lagging competitors, those who are left behind will need to either find ways to adapt themselves or die.

About the author

Chris Read currently works at DRW where he helps developers and operations deliver more value faster. He used to work for ThoughtWorks as a Principal Technical Consultant and Infrastructure Specialist. This means he helps developers understand the environments they are developing for, and helps infrastructure people build better systems. His specialties are Unix (any flavor), scripting and networking. He has done time as a Developer and as a Sys Admin, but now he is both…

4 thoughts on “DevOps: State of the Nation

  1. Good article. I think people are gunshy about accepting that DevOps is about agile, and just want it to be about automation or whatnot, but we have had automation around for many years – it’s the agile philosophy of collaboration and adoption of agile techniques which distinguishes DevOps from “what we’ve all already done, just 10% better.”

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.