For some, this term immediately resonates. If you have lived through a success disaster (or are in the middle of one right now!) that is probably the case. I've used this term for years, but a quick web search comes up with over 7,000 references to the term, including the definition on wordspy.com of "Massive problems created when a person or company is unable to handle an overwhelming success". I'm uncertain what the origin of the term is, but I've been personally using it since around 2002.
This can be a pivotal moment for a company. Your first thought is usually something like
"What we wanted is actually happening!"
Soon after you may start to have thoughts like
"Oh no, it's actually happening - can we really handle this?"
And, sometimes that is followed by
"Everything was working fine before all these users showed up all of a sudden..."
I've been lucky enough to be part of several success disasters, and I mean it when I say lucky. The alternative is to NOT be part of something where the success is so massive that it outpaces what you think you are ready for. It requires you to take action in much the same way as you would in a disaster scenario - the difference is that this is all about upside and growth.
Do any of these success disasters sound familiar?
Our site "crashed" on cyber Monday - We had more <orders/traffic/pageviews> than we ever imagined before it tipped over, but we couldn't keep up and just got overwhelmed.
We just signed a single client that by themselves is orders of magnitude larger than any other client - at first we were a little nervous about getting them live and happy enough to be a reference for other clients, but now they are threatening to cancel!
We just increased our customer base by orders of magnitude - all in our target market and mostly with the same needs - but we just can't keep up. We have so many work arounds and manual processes that seemed fine before, but now customer satisfaction is down, burnout risk on the team is up, and we are running out of rabbits to pull out of the hat.
For #1, "crashed" may be a bit more nuanced. It could be something where page load time got so high that your abandon rate went through the roof, or perhaps it was a "brown out" where only some servers were unresponsive, etc. Maybe it wasn't cyber Monday, but was some other "spike" event. Maybe it wasn't your site that crashed, but a data issue that you didn't even identify at the time. But, it's still the same theme: super high load that drives a failure condition. The good news is that growth is the reason for the problem. No one would say "if it weren't for all these customers everything would be working great." OK, I might have said this once or twice - but if you knew me you'd expect some kind of sarcastic humor when things aren't going well (to help lighten the mood).
For #2, this can take many forms. I recall an incident years ago where our data onboarding for a "whale" customer was estimated to take 2 years to complete. That lead to a heated discussion with the account team letting them know that yes, the estimate was 2 years, no we currently had no better plan, but that we would solve it. Which we did, and ended up with a capability to handle these types of customers in a way that differentiated us from our competitors.
For #3 - this can be a lot of things, and often falls in to the typical "people, process, technology" range of options. This may be something that sneaks up on you. 20/20 hindsight and confirmation bias may make the problems appear obvious once recognized, but that's usually not helpful to fixing things. This is also not about "ordinary" improvement related to people, process, and technology - the thing that is different is that things have gotten to the breaking point. Maybe you can make it one more day, but for you and your customers things just can't continue as is, and minor tweaks aren't going to cut it.
For all these areas this is often a "target rich" environment when you think about improvements that you could make. This is when choosing what to focus on is incredibly important (more on that later), but it's often the case that this will be a long journey and the first few steps are very important. When you are in the middle of a success disaster, you need to think about it just like incident response: What are the things that can help quickly, so we can "stabilize the patient" right away (and create some time and space for longer term fixes).
A success disaster should become a wake up call to think about how you work in an entirely new way: where resilient teams and infrastructure, continual improvement, feedback loops, and learning are part of daily work. This is similar to John Allspaw's perspective that "Incidents are unplanned investments, and they are also opportunities. Your challenge is to maximize the ROI on the sunk cost..." I mention this here because you may get some surprising responses to this thinking - especially when everyone is emotionally involved in the current disaster. I'll write more on this in a separate post, but wanted to introduce this early on as part of what defines a success disaster is that it is key change moment for an organization.
In follow on posts I'll talk about how to handle these key moments in a company's history, and how they can drive resilience in the organization. I'll also identify some anti-patterns to long term success to watch out for (things like burnout, hero worship, and "human blaming" for failures along the way).
Comments