How to Handle an IT Shitstorm

As in life, it is not possible to anticipate or prevent every problem that may come up in IT. Nor is it necessarily desirable to do so. ((Though many a systems engineers will die trying.)) Business is about accepting certain risks, making trade-offs, and allocating varied but never limitless resources. Tough decisions must be made every day in business. Not all decisions that end up leading to undesirable results were flawed decisions. ((And not all decisions that lead to attractive results were unflawed decisions; business – like life – can be confusing.)) And there’s always insurance for some things.

A few years ago I sat in with a team that was dealing with what can best be described as a shitstorm of epic proportions. They were experiencing a denial of service attack against their Internet facing server farm infrastructure. As a global SaaS business, this was devastating. ((Any resemblance to any particular organization is merely a coincidence; this is an amalgam of several situations, though all pertinent details, including the whiteboard quote, are from real situations I’ve observed or been a party to firsthand.))

This team had set up a “war room” with the various involved parties all represented. When I visited, this included eight individuals, not including myself. The team’s leader had directed most of the immediate efforts towards the active attack. This would surprise no outsider observing the situation. The attack had been partially mitigated at one point already – only to pop up in another spot. This, too, was not surprising. ((It’s par for the course when it comes to DDoS attacks.))

As the team leader looked across the room at each of the individuals assembled, however, the thing that most heavily sat in his brain was the team’s next step after getting through this immediate fire. He realized that even though getting through the attack was the immediate problem, it was really just a symptom of the real underlying problem.

There were additional things they could do – and, he knew, arguably should have already – to prevent, mitigate the impact of, and be better prepared to respond to these types of malicious attacks. The deeper insight though was this thought: “Chances are, if there is room for improvement in this area, there probably are in other areas as well.”

Most of the guys on this team had slept very little, if at all. They remained focused, inquisitive, forward thinking, and full of humor (sometimes the only thing you can do to stay sane is laugh, after all). They were also tired, pissed off, and hungry.

During my visit with the team in their war room, written on the white board, with a nod to Dr. Strangelove, was the following reminder:

“No fighting in the war room.”

Pointing fingers doesn’t move you forward. Nor does hiding the dirty laundry from your team, management, and any stakeholders.

I’ve been inside IT departments that are afraid to discuss flaws outside the department. Management is uninformed. Everyone had blinders on. The same mistakes are made again and again. No improvement occurs. ((Typically the culture can best be described as one of a cover your ass mentality combined with rampant cynicism. Not good for anybody.))

IT departments that stay together during crisis, succeed. Together, though, does not mean keeping the dirty, ugly truth within the department.

Whenever a major incident occurs, a post-mortem is the most valuable tool in the IT arsenal because anything can be improved and every situation can be learned from.

Those organizations that reflect constructively and adjust accordingly, even after the immediate crisis has passed, achieve growth. And organizations which manage to establish this mindset outside of just the IT department achieve excellence.

But this is only true if transparency and continuous improvement are embraced from the top down and a ruthless tempering of finger pointing is practiced. This doesn’t mean that there is no accountability. Good management can tell the difference between a reasonable mistake or an oversight versus gross incompetence. And either way, there’s an opportunity to learn and to improve.

Please note: I reserve the right to delete comments that are offensive or off-topic.