In the prior post on What is a Success Disaster I spoke about this being a pivotal moment for a company (and the people that make it all work). A success disaster can be defined as "Massive problems created when a person or company is unable to handle an overwhelming success." To start, let's be clear a success disaster shares a lot of characteristics with any technology disaster. This includes high pressure, high expectations, usually a healthy dose of FUD (fear, uncertainty and doubt) and a whole range of emotions based on who is actually involved. I think the big difference is in the mindset and business environment:
If we can figure this out we'll get to the next level of growth / success in our business.
So, how do we handle our initial response to this "opportunity?"
Step 1 - Take a moment to breathe, and assess the situation
One of the most important things to do is to gain a level of situational awareness. There will be demands to "do something!" But I'd suggest step one is to get an understanding of what the actual situation is. This is harder to do than it sounds. Whether you think about it in terms of the OODA loop (observe orient decide act) or some other methodology, it doesn't make sense to randomly start doing things, as you might make them worse.
Also Step 1 - Communicate!
You'll create more space for yourself and your team to assess the situation if you tell your constituents what you are doing (and set expectations). A simple "We are assessing the situation and we don't have a plan or estimate yet. This is a top priority and will update you with more information and how you can help at <specific time>." Early in the process, you need to stay out of 2 traps: (1) Going silent and (2) Agreeing to a plan that you don't understand.
Going silent may seem like it's part of "breathe" or taking time to assess, but it's not. You need to tell people what you are doing! You also need to express empathy for how this is impacting others. I've seen teams think that since they don't know the answer they should wait for more info and not communicate until then. There is also the natural tendency for diagnostics to take longer then you think. While you are making progress, the rest of the organization (and your customers) will be getting very restless if you don't talk to them. Remember, in the absence of information, people will manufacture their own - and potentially start taking action on their own.
Agreeing to a plan you don't understand is also something to be careful of. Often in a success disaster there will be a key "player" who has identified the problem early and has an opinion on what the "get well plan" needs to be. Part of gaining your own situational awareness does include understanding this plan and other potential options - but you need to make sure the path you take (the plan) is likely to have the outcomes you want. You also need to understand if there are any unintended consequences.
Communication can be hard, and it's a lot harder if you start in the middle of a disaster.
While this is stating the obvious, it is often the case that even the mechanics of communication are being worked out in the middle of an emergency. Some things to think about ahead of time include:
Who are the discrete groups, and what do they need? Let's remember that we are talking about people - whether they are in other departments, on your team, or outside your company. Communication should start with thinking about what they need, and trying to get it to them.
For these internal groups, how do you reach them? This may be as simple as knowing what email groups or slack channels to use. If these are not known or currently in use, this is something that is a lot easier to figure out when NOT in the middle of a disaster.
For external constituents, how do you reach them, and what will you share? This is an area where a success disaster may have different approaches depending on what the actual disaster is. Imagine you have a major problem impacting a single client (your largest client). You probably don't want to put that on your status page. But if it impacts all of your users, a status page and social media engagement is often a good way to handle this. In terms of what you share, this is something that is based on your company's approach. My suggestion is always to be transparent, and use this as an opportunity to build trust - not to hide things. Keep in mind that honesty with your clients and teammates doesn't mean throwing gasoline on the fire. I've had situations where the current status was "we have no idea." That's not what I communicated as it's NOT the whole story - instead I'd suggest something like "we are still diagnosing the situation. Our team is actively engaged in the troubleshooting process, and we'll update you as soon as we have more information to share."
You don't need to do it all yourself. Whether you are in a large or small organization, there are likely people who can help with this. Do you have a marketing person, or a customer care person, or a PR person (or team). If this is a company priority, you should be able to get help with communication. Again, this is something to talk through before the disaster strikes. If you haven't - it isn't too late. Likely one (or all) of these folks will be hunting you down looking for status. Ask them for help AND their advice on some of the prior bullets if you didn't work them out in advance. When a teammate lets you know that they need more on the communication front, "what would you suggest?", followed by "can you help make that happen?" is a great way to make it to the other side of this particular success disaster as team.
Think about your team and the incident responders
Let's not sugar coat it, part of responding to a success disaster requires immediate priority shifts, changes in work, and more often than not extra effort. This creates pressure on the team, and each team member may handle it differently. You must think about this.
I'll talk about the dangers of a hero culture in a follow up post (here's a talk titled "From Hero to Zero" from DevOps Days Boston 2014 that you might like in the meantime), but early on in an incident you need to think about the team. Some questions to ask:
How long is this likely to last? (it's OK to guess)
Am I thinking about shift coverage for an extended event? Does that coverage include taking care of myself too?
Have I created an environment where teammates look out for each other?
Am I approachable if someone has a personal issue - even if the timing doesn't seem great from a business standpoint? (people don't get to schedule personal issues - they happen)
Does the team doing the work understand why it matters?
When was the last "big push" for the team - did they just come off another emergency, or off hours work, etc? This is important context.
How are other teams impacted? Maybe you understand what is going on with the technical team - but what about Customer Care, or Sales?
This isn't a comprehensive list, but it's a good place to start. Some of the answers to these questions might be troubling. Does your answer go something like "Our tech lead is the only one who can solve this, and she just worked all weekend on the last emergency." Unfortunately, many people have had that answer. But that also means that this is a path others have navigated, and you can too!
Final Thoughts
Here's a few other things to keep in mind as you work through the situation:
Urgency: If you are someone (like me) who appears calm during emergencies, you may need to let people know you are acting with urgency.
Empathy: Always try to understand how this event impacts your teams, customers, and the business - it will help in ways that may surprise you.
Capability: Know your team and what you are capable of. While your team may not have all the answers, they are the right people to navigate through this situation (and will learn from it).
Honesty: It's important to be a truth teller - wishful thinking doesn't solve problems. "I don't know / we don't know that yet" may be an uncomfortable response to a question, but is certainly better than a made up answer.
Perspective: In the middle of any incident (including a success disaster) is not the right time to do a post mortem or make mandates like "this can never happen again". Sure, "never again" is an reasonable feeling (remember EMPATHY?) but the common ground is probably something like "I agree, what we are all going through right now is awful and none of us thinks this is sustainable."
Enjoy the ride: Part of maintaining your perspective is to remember that this is brought on by success. It may be hard, and there will be a lot to learn - but if you like growth, buckle up and enjoy the ride...
Comments