Establishing A Crisis Engineering Center

6 astronauts & 2 flight controllers monitor the console activity in the Mission Operations Control Room (MOCR) of the Mission Control Center (MCC) during the Apollo 13 lunar mission.
14 April 1970, a group of experts, in a control center, taking actions to rescue a damaged spacecraft.

Our new book, Crisis Engineering, is about what actually happens when systems break under pressure, and how to fix them.

It comes TODAY, April 7th, everywhere books are sold as a paperback, and audiobook (read by Cassandra Campbell).

For bulk orders below 500 copies, purchase on BulkBooks. For bulk orders over 500 copies, please contact marina@layeraleph.com.


Establishing A Crisis Engineering Center

Perhaps you need a crisis engineering center for an approaching deadline, launch, or predicted event (selling Taylor Swift tickets, broadcasting the Super Bowl, or the scheduled re-balancing a multi-trillion dollar index fund).

Or, let's say you've find yourself in a useful crisis, one that has, say, three or more of the indicators we've covered:

This is good news! You can turn these circumstances into rapid, directed change of any complex system with a crisis engineering effort.

The primary tool of crisis engineering is what we call a "crisis engineering center." We made up a new name because we suggest a set of practices that expand and complement what is already known as "incident command."[1] We also want to highlight a shift from mere communications & coordination to an action-driven transformation of organizational behavior by creating an environment for sensemaking[2].

You're going to need a few things, each of which are essential:

  1. Convening authority
  2. Designated venue
  3. Decision-making authority & access permissions
  4. Single, well-known means of low-latency communication
  5. Incident lead & some number of experts
  6. Single shared journal of events & decisions
  7. Broad, prominent announcement & kick-off ritual

Convening authority

A successful crisis engineering center will, on purpose, lead to redirecting staff, relocating personnel, reassigning roles, and expanding or contracting responsibilities. It will temporarily disrupt the normal working order of an organization.

A sufficiently powerful, influential, or highly placed convening authority has to declare the crisis engineering center and assign staff to the effort. This means someone with the ability to influence operations and resources across an organization. Think: a CEO, a member of the C-suite, or staff reporting directly to the C-suite. It's possible for an organization to have a sub-division that is big enough to contain a its own crisis and all its indicators. In that case, the convening authority should be a division head.

All that actually matters is how the organization perceives that authority. Is it someone who can announce a new, overriding priority and make it stick? They'll of course be reluctant to use such power, but the conditions of the crisis should be strong motivation.

A good test of whether your conditions are truly a crisis: is an executive sponsor willing to temporarily sacrifice other priorities?

Designated venue

We call it a crisis engineering center because it must have a physical location. The work has to happen somewhere, and it must be concentrated. Part of what makes our approach work is amplifying the social aspect of sensemaking: drawing members of the crisis engineering team into a shared understanding of what is happening and what action we are taking is critical to changing a complex system.[3] It's impossible to do this if the team is scattered around different locations. It's equally impossible (progressing at 1/100th the speed, or worse) in a dedicated "virtual" meeting.

The room should have Wi-Fi with the password tacked to the wall. There should be places to sit, and lots of places to plug in laptops and phones to power. Videoconferencing equipment is nice but unnecessary. Anyone in the organization should be able to find and enter the venue (it cannot be behind a special-access keycard-restricted door).

We cannot overstate how important it is to get the whole team into one physical place. Relocate people to this center, via airplanes if necessary. The more unusual this is for an organization, the more benefit can be had from the signal that something new and different is happening. If your organization has no physical location, a conference center near a major airport hub can suffice.

Decision-making authority & access permissions

The speed of sensemaking, problem solving, and therefore procession through a crisis is driven by the speed at which you take action. Everything that postpones or prevents a crisis engineering center from deciding on or taking an action impedes progress.

During normal order, members of an organization are concerned with the allocation of authority, risk, and blame. Actions are postponed pending decisions or authorization from other bodies in the organization. These delays must be minimized or eliminated, and a crisis is often the only time such a thing is possible.

A great solution is to place a sufficiently powerful person into the room, who understands that their purpose is to grant permission and make decisions on the spot. It can be someone with nothing to lose, like a a senior manager about to retire. It can also be the CEO, if they know failure will end the company. Who and how it works can vary, as long as the crisis engineering center can make any decision or get any approval within just a few minutes.

Single, well-known means of low-latency communication

Anyone in the organization must be able to easily reach the center, so that new information can always reach the crisis engineering team. The center needs one communications channel (not multiple) that is highly publicized, and can be accessed by anyone (no special tools or permissions required). Timeliness of new information is critical, so email is not the tool for this job. Group chats or whatever messaging platform is currently in vogue are also bad, unless someone's full time job in the center is reading messages aloud as they come in.

Our experience suggests the simplest thing works best: an old-fashioned, always-on phone conference in the center of the table. A speakerphone is low-latency and commands attention, which is good. No video is also good, so that when communication becomes too complex for talking, the natural inclination will be to ask them to come in person.

Incident lead & some number of experts

The convening authority must appoint the incident lead first. You can call this role what you like; we've seen "pit boss," "response chair," and "incident lead." The lead doesn't need to come from management. What matters is that they are perceived as a person who cares about solving the problem more than they care about anything else. People contribute their best if they feel they are helping solve a problem. They don't if they think they are helping some loudmouth get promoted.

The lead must understand they are primarily responsible for keeping the sensemaking loop running inside the center, by making sure actions are taken and their results observed and documented. Some people know how to do this intuitively. Anyone will be better at it if they read our book. If your crisis engineering center runs around the clock, you need at least two leads to trade off in shifts.

We have lots of concrete suggestions for finding the other team members in our book. They should be people who know facts about the system, or where to find facts about the system right now. Start a list now, because you will need a source of truth about who is (and is not) officially invited sooner than you think. Keep the list someplace central and accessible, like a whiteboard or at the front of the journal we're about to describe. The list includes names, areas of expertise, and means of contact.

Single shared journal of events & decisions

Sensemaking is restrospective: it happens as we look backward in time and come to an understanding of what has happened. If there is no easily accessed record of what things have happened, confusion and chaos will reign. Rather than make progress, you'll find yourself locked into a cycle of repeating circular arguments. The incident lead is responsible for making sure the center discusses what the team knows, and decides what action to take next, at least once a day.

They must also make sure that discussion and decision are written down. When a hypothesis is made or an action tried, write it down, with a time stamp, and what happened as a result. Continue this log for the duration of the crisis engineering effort. Things the team might try next or try later go somewhere else; call that a "sandbox" or "backlog."

A large whiteboard or a single shared collaborative document works for the journal. There must be only one, and it must be accessible to every person on the team at all times. If your team is emailing around CE_Center Journal_Monday_final_FINAL.docx, a blood-dimmed tide is loosed and your center will not hold[4].

Broad, prominent announcement & kick-off ritual

The announcement

The announcement of the crisis engineering center is an important piece of organizational magic. It must come from a sufficiently powerful or highly-placed leader as discussed above.

The announcement must do two things. First, it must declare that there is a new thing, a crisis engineering effort, in a particular place. (You can call it whatever you want, as long as it's recognizable.) Second, it must convene staff into that new thing, by mandate: if the team needs you, you drop everything else and come. The crisis engineering effort being declared as the temporary new top priority signals to everyone that something new and different is happening.

The announcement can take whatever format is appropriate to the organization, as long as it accomplishes those things. Maybe there's already a formal process for this sort of declaration; use that. A company-wide email from the right authority usually works, too. The announcement should contain all of the following:

  1. Declaration that there is a crisis engineering effort that temporarily overrides all other priorities
  2. The name of the center/effort
  3. The location of the crisis engineering center
  4. The primary problem(s) the effort seeks to solve
  5. The name of the incident lead
  6. Any authorities delegated to, or adjudicating decisions for, the effort
  7. Specific contact details for reaching the crisis engineering center

The kick-off ritual

The announcement should be accompanied, if at all possible, by a kick-off ritual. Starting the center is part of how you start framing, or bracketing, the story of the crisis. Our book has an entire chapter on this storytelling strategy. The crisis affects everyone in the organization and destabilizes people's sense of identity; this is what makes it possible to make or break the habits which makes up your organization's normal order of business. Nearly everyone's natural instinct is to get involved, contribute in some way, and reestablish their sense of how they relate to what is happening. If you provide a plausible story for what caused the crisis and what is likely to fix it, you can get the best out of people for a time.

The ritual must include a call to action that includes expectations for everyone in the organization. Many managers fail to do this, and everyone rolls their eyes as they shuffle back to their desks with a sense of "yet another meaningless announcement." You need to impress upon as many people as possible that:

  • There is a new threat; and
  • The organization is doing something new to address it; and
  • In time, everyone will be part of that “something new.”

The call to action doesn't have to be specific yet; communicating uncertainty is good[5].

With a center in place, an announcement made, and the crisis engineering effort kicked off by ritual, you're on your way to steering the inevitable change a crisis will cause to the organization in the direction you want to head. Read our book to understand how.


  1. Best practices in incident command can be helpful if you are in the position of planning a crisis center for your organization. We suggest U.S. FEMA's Incident Command System (ICS), and its forebear, California FIRESCOPE's ICS. ↩︎

  2. Sensemaking is a continuous, social, retrospective process of building a plausible story to understand what is happening and our relationship to it. In our book, we have a lot to say about this process. It has been studied directly by Karl Weick, Nancy Pennington, and others. Daniel Kahneman calls it "System 1" thinking in his book THINKING, FAST AND SLOW. ↩︎

  3. A term of art, with a specific meaning, drawn from the cyberneticists: a system with both human and machine parts. ↩︎

  4. William Butler Yeats, The Second Coming; https://www.poetryfoundation.org/poems/43290/the-second-coming ↩︎

  5. The inestimable Peter Sandman has plenty to say about this. Start here: https://www.psandman.com/col/uncertin.htm ↩︎