Video recording and production done by DevOpsDays.
Current monitoring tools are clearly reaching the limit of their capabilities. That's because these tools are based on fundamental assumptions that are no longer true such as assuming that the underlying system being monitored is relatively static or that the behavioral limits of these systems can be defined by static rules and thresholds. Interest in applying analytics and machine learning to detect anomalies in dynamic web environments is gaining steam. However, understanding which algorithms should be used to identify and predict anomalies accurately within all that data we generate is not so easy.
This talk builds on an Open Space discussion that was started at DevOps Days Austin. We will begin with a brief definition of the types of anomalies commonly found in dynamic data center environments and then discuss some of the key elements to consider when thinking about anomaly detection such as:
Understanding your data and the two main approaches for analyzing operations data: parametric and non-parametric methods
The importance of context
Simple data transformations that can give you powerful results
By the end of this talk, attendees will understand the pros and cons of the key statistical analysis techniques and walk away with examples as well as practical rules of thumb and usage patterns.
Do these tales of woe sound familiar?:
"I can't get my managers to support my DevOps efforts"
"Company politics prevent me from getting help from other teams"
"People say they are too busy getting real work done to worry about a fad called DevOps"
"Our CIO wants 30 deployments per day and told me I have until next quarter to to buy and implement the DevOps tool to make it happen"
These are all actual examples from companies lacking organizational alignment. A lack of organizational alignment is the #1 killer of DevOps dreams and good intentions. Left unchecked, a lack of organizational alignment will undermine any improvement efforts you attempt and continuously generate more disfunction throughout your organization.
So what can you do? Organizational alignment sounds like a a complicated and daunting culture issue, doesn't it? Where do you start and how can you make progress?
In this presentation, I'll be walking you through a proven set of design patterns for building organizational alignment. You'll walk away from this presentation with the outline of a plan that you can adapt and apply to your own organization. No special people skills or management kung-fu required.
Your path to organizational alignment will contain three "acts":
Act 1: Making the case -- getting buy-in from your colleagues, support from your managers, and the budget to bring a plan to life
Act 2: Learning to see -- getting the entire organization (up and down the management chain as well as across the value stream) to see the actual problems in the same way and talk about those problems using the same language.
Act 3: Solution design and execution - getting the entire organization to embark on continuous improvement (and to stay on course!)
We thought we were doing all the right things with our build and deployment process. Everything that could be automated was automated. But we still found that our build and deployment process took much longer than expected. In short, even with lots of automation, there was still lots of waste. This talk focuses on analysis techniques used to identify waste in our deployment processes and the changes we made (both in process and technically) to reduce waste and speed up our feedback cycles.
Techniques covered -
Value stream mapping
Fishbone root cause analysis
5 Whys root cause analysis
Security testing is often done at the cadence of auditors and not at the pace of the dev team. Gauntlt is an open source framework that helps you “be mean to your code” through development and into release--facilitating ruggedized software and better communication between dev, ops and security teams. This talk will help you implement Gauntlt into your projects with plenty of real world examples.
DevOps and Continuous Delivery drive dramatic shifts in the roles, perspectives, and skill sets of QA Engineers. They must become software engineers in their own right. At the same time, they need the ability to understand and validate operational concerns such as security, scalability, resilience, etc. In a software-as-service world, QA's ultimate job is to help DevOps organizations understand the customer's job-to-be-done and ensure they satisfy customers' needs for deliverability and serviceability.
This talk will provide a detailed introduction to the new role of QA within DevOps. It will describe testing tools and techniques applicable to Continuous Delivery. It will also explore ways that QA can help connect Dev and Ops more directly to the business and to the customer.
There are many sources of complexity in any infrastructure automation project. Some are inherent to the infrastructure itself: Cloud vs. hosted vs. VMs, operating systems, software configuration, etc., but others are inherent to the problem users are trying to solve. As DevOps and infrastructure automation adoption keeps increasing outside of the datacenter, the scenarios get more complex and abstract. I would like to explore the sources of this complexity and talk about a programmatic approach.
In general, when talking about infrastructure automation, we tend to consider a single production scenario, OR maybe a few of them if you consider QA, staging, etc., or even different zones, but these scenarios are relatively static in number and in form. As automation goes upstream, from the Ops world into the Dev world, the number and dynamic nature of the scenarios increases dramatically. Developers see infrastructure as a fluid concept, just like the software they are accustomed to writing. For example, a developer needs to test a new code on an environment built to match the production one, but with some very specific variations, or sometimes to test the same code with many of those variations automatically. Often, the configuration and state of the services the developer requires might need a specific network layout, or the database loaded with a particular dataset. Furthermore, there usually are many developers on the same product team, and they might work concurrently on different code branches at any time, each branch possibly requiring different variation of the infrastructure. In these scenarios we see a combinatorial explosion of systems to build, and a big bottleneck for a DevOps team.
Similar complexity growth can be seen with cluster-centric infrastructures, as is the case in Big Data projects. Building a big data setup is a relatively complex but mostly solved task, but operating these clusters continues to be mostly an art form. The decision making required for operating these clusters varies from cluster to cluster, from user to user, and the decisions to be made involve OS, software, cluster, and application-wide knowledge. This is again a combinatorial explosion of factors and decisions.
One way to tackle such complexity is by taking a programmatic approach to automation. From this perspective, you first abstract over some of the inherent sources of complexity --infra, OS, software, etc--, then over the composite parts of your infrastructure, nodes, clusters, groups, and finally write the code to operate all OF these abstractions in a coordinated manner. With this approach, the actual programming is no different than regular software development, and hence you can leverage the tried and true Software Engineering practices for all other complex software development.
he new world order of business is increasingly centered on the speed in which an organization can leverage digital commerce to rapidly deliver goods and services to consumers. This movement was ignited by the success of companies such as Amazon and Google – who are disrupting traditional businesses by serving consumers better and faster with innovative technology.
To move at this kind of speed, businesses must transform. This means IT needs to be a front office strategic imperative to business and not the back office support system for internal operations. The reality today is that separation between Dev and Ops in large organizations still reigns supreme. We all know this has to change.
In the Ops world many of us seeDevOps as the future. Our developer brethren think about Agile development practices and moving from multi-month (or year!) development cycles to short sprints. We are all really talking about the same thing. Breaking down organizational boundary’s and integrating teams to move faster. Then integrating tool chains so that we have common tooling and don’t need to hand off artifacts (see CODE!) and continuously delivering innovation for the business. There is a process for making this transition and that we here at Opscode have seen in practice. Adam will talk about the 5 key drivers of success in making that transition.
Warning - business transformation is not for the light of heart!
We have a small team that would like to propose a panel discussion for the Devops Days event in Mountain View. The format would be a quick introduction of the topic with a few talking points followed by a back and forth panel discussion with the audience. While we are of course proposing this panel, we would welcome and encourage another team in similar circumstances to join us in the panel discussion to broaden the viewpoints.
This would be a panel discussion to discuss onboarding a new engineer into an existing devops team. Discussion includes how to integrate a new engineer into a team practicing continuous delivery, establishing a mentoring relationship and the importance of devops as a foundational engineering culture.
Initial opening discussion would include a few items to seed the discussion. These would reflect topics for thought from the perspective of the hiring manager, the mentor and the protege and would not only include discussions around the devops mindset and discipline but would also extend out to how we grow and maintain a sustainable talent pool.
We have recently hired a new junior engineer from Hackbright Academy (hackbrightacademy.com) and the panel would consist of her (Mercedes Coyle), her mentor (Gary Foster) and the hiring manager (Erik Sowa) involved. Special emphasis would be given to our internal devops-centric culture, how we approach problems in an experimental fashion and how we encourage or engineers to grow within that mindset.
We would also relish the opportunity to do this panel with another team. If there is another company with a similar team that would like to take part we would love to share the panel with them and would encourage them to participate.
As DevOps matures from craft, through trade, to a science, we are starting to work on distilling out how we can make DevOps' implementation and the organizational transformation repeatable and predictable, across all kinds of environments. As part of that search, it is time to start looking at humanity's other "operational" endeavors and see what is applicable to DevOps.
This talk examines one of the largest operational systems built to date: the national airspace system. We will look at specific aspects of how controllers (operations teams) work with pilots (developers) to safely move millions of passengers (customers) every year, with an incident rate that would make any development shop jealous.
In aviation, harsher, more crowded, and inclement conditions all require additional training: an instrument rating. Similarly, developers and operations teams buying in to "DevOps culture" is a great start, but it's often hard to nail down what that actually means. We'll examine the specific behavioral and operational elements of this other complex system that has been tamed and look at what's applicable to implementing a DevOps culture within our own industry.
Finally, we'll examine some of aviation's hard-learned lessons, and look at ways we can leverage this knowledge, and avoid those classes of pitfalls.
The values and ideas espoused by the DevOps movement are great, but by itself isn’t enough. That should be obvious because it is just Dev & Ops, right? Frankly, DevOps is just a piece of a bigger puzzle that needs to be put together to form a whole picture. The same goes for Agile development—not enough. We now have the Lean Startup and LeanUX. We have technologies and practices like the Cloud and Continuous Delivery. By themselves none of these are enough! They address only a part of the required transformation. To really be successful, companies and organizations need to adopt a holistic approach to create an organization that can sense, adapt, learn, and respond like a living, sentient entity.
The key to a holistic approach is creating a focus on customer purpose. All activities and organizational units must contribute to the creation of customer value and success. Without the proper orientation towards the customer each part of the organization will perform sub-optimally and could even unwittingly hurt overall efforts.
This talk will discuss how to align to our customer’s purpose (the REAL customer) and how to pull all of these related “agile ideas” into a complete, working, value creating organization.
The idea of DevOps to operate effectively in an organization is not limited to the world of software. The concepts of culture and sharing information and data also occurs in other realms as depicted by the restructuring in the Central Intelligence Agency (CIA) to do a better job to capture Osama Bin Laden.
In this lighting talk, I will elaborate on how the CIA, in the 1990's was a siloed organization that had analysts (who studied threats/motives) and operations (field agents) that worked towards keeping America safe from terrorist threats. However, after 9/11 the agency restructured the counter terrorism unit for analysts and operations agents to work together on the hunt for Bin Laden, and sometimes, where analysts became operations agents to operate more effectively, and solve the task at hand.
To conclude, the pillars of devops (culture and sharing information) isn't limited to just us technologists, but transends to other fields.
I'm basing my talk on interviews (example: pbs.org/newshour/bb/terrorism/jan-june13/manhunt2_05-01.html) and the documentary Manhunt (imdb.com/title/tt2475544/).
Salesforce has been undergoing a (sometimes painful) DevOps transformation for almost a year now. Along the way we have had a number of successes, and a number of failures. I'm proposing an ignite talk in the style of Todd Parr's "Do's and Don'ts" and "Underwear Do's and Don¹t's" (amazon.com/dp/0316908061/ref=rdr_ext_tmb) books for 3 year olds. With the 20 slides, we can cover up to 20 best/worst suggestions for practicing DevOps.
DO: Adopt the CAMS model for explaining to your executives and others why we're doing what we're doing and what are the components of a successful transformation
DON'T: Worry about the tooling when you could be worrying about the culture. Cultural change is what it takes to be successful in your transformation and at a big company like salesforce.com, cultural change is HARD. (plus, you might make John Willis take his C from the model and go home)
DO: Get the Ops team involved early in the process. DevOps needs to be driven by and participated in by both organizations. If one side is preaching to the other, you won't have DevOps and you will probably be in worse shape than when you started.
DON'T: Get Ops involved earlier by allowing them to create a "Front Door Process" where you need to fill out a form when you start on any development project and submit that to Ops to get their approval (yes, this actually happened)!
Application monitoring and troubleshooting is a developer and operations concern that causes friction. The current model in which DevOps runs release automation until it hits a performance wall, and Operations swoops in to save the day is not working. Since testers cannot reproduce the complex application operational environment, traditional testing tends to be reactive. We need to change the mindset to “if it is not broken, it may be breaking” and shift to tracking close calls instead of relying excessively on actionable alerts (which often create false positives). According to the Process Improvement Institute, a risk analysis firm, across many industries there are between 50 and 100 near misses recoded per serious accident, and about 10,000 smaller errors that occur during that time. The speaker will discuss practical approaches for implementing tracking of close calls, the different groups of data sets available, a new analytical approach and changes in DevOps to allow this.
This idea was inspired by Jeff Hackert's talk about Conway's Law at DevOpsDays Austin 2013. Communication sucks; it's the biggest problem between any two people on the planet. Bad communication leads to bad discussion. Bad discussion leads to low productivity, low morale, and bad products. In order to communicate effectively, it is necessary to shut down some of the more primal parts of our brain; as Hackert talked at length about, the amygdala is the part of our brain that kicks into action and gets defensive when someone disagrees with one of our ideas. The amygdala is primal and irrational, while we need the calm, rational powers of the neocortex.
The gist of the talk I'm proposing is to highlight some of the very negative aspects of programming (see "Be Nice To Programmers" edu.mkrecny.com/thoughts/be-nice-to-programmers as well as some very negative aspects of our existence; such as our place and significance in the grand scale of the universe. Then, take these negative realities such as "I am insignificant in the scope of the universe", "I know nothing", and use them to facilitate better discussion and communication by cutting out the amygdala. Keeping the amygdala out of the office can open up your mind to learning new ideas instead of clinging to the defense of your own ideas. The talk then focuses on the difference between being wrong and failing, and highlights that being wrong is how you learn and learning should be the goal of every interaction you have at work. Not being right.
Based on my article published on CM Crossroads cmcrossroads.com/article/eight-mistakes-prevent-devops-success
I've seen several problem in companies over the years that I believe are significant. Besides cultural challenges, there are technical challenges that are not specific to any tool chain that need to be addressed in order for a DevOps initiative to scale beyond the scope of a small team. I will also cover that while a technology first approach to DevOps with a coalition of the willing may make a successful proof of concept project, cultural issues need to be addressed too.
IT is actually a multi-cultural society comprised of different factions, each advocating for their favorite framework, methodology, technology or philosophy. Culture clashes can range from denial, silo isolation and the most unfortunate “them vs us” mentalities. But not surprisingly, each faction has the same end-game: high quality services that meet business needs at the right time and the right cost. So if everyone is celebrating at the same party, why then is there still such a cultural divide?
This presentation examines each culture and discusses ways to recognize and leverage cultural similarities, embrace the differences and cultivate the best of each in order to build a positive, productive and proud culture that is uniquely your own.
While this will start as a presentation, I think it would be awesome to get some audience participation, first by polling for the types of frameworks/cultures in place in their organization and then perhaps inviting some people to the stage “party panel” to see if we can get them to share some thoughts and even acknowledge some alignment.
In the past year, Stackdriver has gone from, well, not existing to supporting over 130 beta customers. We receive thousands of measurements per second from our customers and perform time series aggregations in real time. The infrastructure for our hot data pipeline resides on AWS and is designed for scale, availability, consistency and performance using principles that originate in a book written in 1978.
Wait, what? 1978?
In this talk, we will discuss how the design choices we made for our stateless cell architecture were influenced by a book called Systemantics, first published 35 years ago. We’ll also demonstrate our cell architecture, which is a concept that you could reuse in your company.
For example, one of the quotes adorning the walls of our office is “What could possibly go wrong?” This kind of defensive thinking permeates our consideration of software architecture, system architecture and cloud architecture. Not only do we presume that things can go wrong, we usually presume that something is always going wrong, or at least partially wrong.
Our hot data pipeline is one of our key assets and was developed by pressure testing our design considerations with Systemantics quotes like:
Systems in general work poorly or not at all.
Complicated systems produce unexpected outcomes
The real world is what it is reported to the system
A complex system that works is invariably found to have evolved from a simple system that works.
The Fail-Safe Theorem: When a Fail-Safe system fails, it fails by failing to fail safe.
Leave the first comment:
Click here to add a new comment
More from Devopsdays 2013 - Silicon Valley
Systems theory for successful enjoyment of AWS - Philip Jacob (ignite)
Devopsdays Silicon Valley - Intro Video - Breakpoint
Devops+Agile = Business Transformation - Jesse Robins
Continuous Quality - Jeff Sussna
Leading the Horses to Drink … Support and Initiate a devops transformation - Damon Edwards
Beyond Pretty Charts … Analytics for the rest of us - Toufic Boubez
Leveling Up a New Engineer in a Devops Culture; Healthy Sustainability - Gary Foster, Mercedes Coyle
Clusters, developers, and the complexity in Infrastructure Automation - Antoni Batchelli
Analysis techniques for identifying waste in your build pipeline - Scott Turnquest
Bottleneck analysis using provisioning of drinks to conference attendees as the examples to illustrate how to figure out what is going on when your systems don't scale.