The landscape of the Systems Administration industry is shifting. As the trend continues towards Infrastructure as a Service (IaaS), Software as a Service (SaaS) and other *aaS offerings, the traditional role of systems administrations is being challenged. While this traditional model will remain (well beyond its expiration date) in large enterprises who have invested heavily in their hardware, processes, and infrastructure more and more companies will be looking to maximize their investment and get the most business value out of their resources.
The industry is being called on with increasing frequency to reinvent systems administration to meet the needs and the demands of a modern business and technology stack.
The traditional way that we have been doing systems administration is on the decline as the desire to break down the old silos between operations and development to maximize business value and inter-group communication and collaboration force both sides to evolve new skills, and at the core adopt new philosophies.
One such philosophy is Site Reliability Engineering, or SRE for short.
Generally accepted to have started at Google, the SRE movement has now spread well beyond to other companies such as Dropbox, Netflix, Twitter, Facebook, LinkedIn, and others.
As my colleague Bill Lincoln will explain in an upcoming post, although this has started with internet scale organizations, SRE is a philosophy put into action that companies of all sizes can benefit from.
At its core, the prime directive of SRE is reliability of a service as a whole and this subtle, yet important paradigm shift is what is driving change within the Systems Administration and Software Development industries towards a place where both groups have a unified objective of reliability and the differences between SRE and SWE become subtle and fluid.
I have been a strong advocate for the SRE philosophy as a major emerging trend in the Systems Administration space with the Pythian Leadership and was thrilled to be able to attend the USENIX Site Reliability Engineering Conference (SRECon14) which was held on Friday, May 30, 2014 in Santa Clara California USA along with two of my colleagues from the Pythian Enterprise Infrastructure Services Group.
It was a single day, but from the first keynote delivered by Ben Treynor, Vice President, Engineering and Google Site Reliability Tsar, to the final Talk by Michael “Mikey” Dickerson on how Silicon Valley’s SREs saved Healthcare.gov, the information delivered was packed full of value, and a good amount of inspiration.
With a prime directive of “reliability” the talks delivered ran the entire lifecycle of an IT Service from Designing for Reliability, Deploying at Scale, Metrics and Monitoring for Reliability, Cascading Failure of a Service and Disaster Preparedness.
The call to action was also clear; You are absolutely within your rights to not like that it is happening, but there is no denying that change is coming. We (SysAdmins and Software Engineers) can choose to evolve, we can choose to challenge ourselves and “up our game” or we can run the very real risk of being left behind.
SRECon14 was a great success and I look forward to attending the event again. I would enthusiastically recommend it to my friends and colleagues who are in systems administration / software engineering roles.
Finally I end with a Quote Ben Treynor “Let the Packets flow, and the pager remain silent”
Leave a Reply