My experience this week at Heathrow is a small but telling example of the global risk many businesses face: multiple systems failing simultaneously. Solving discrete problems is what we do every day, at home and at work. Resolving the failure of a system is much more challenging. However, coping with the failure or near failure of interwoven and codependent systems can lead to catastrophic grid lock.
Heathrow reminded me of airports I used regularly running my business in the Soviet Union – minus the stray dogs and birds inside the terminals. In both settings people were working mightily to succeed despite multiple systems failures.
A common feature of systems failure is a disconnect between people with and people needing information. Communications collapses often go unseen until they spill over either into another business function or into the public domain. The most disturbing and compelling explanation of this may be Edward Tufte’s iconic analysis of how the likelihood of “O-ring” failure was missed in a jumbled PowerPoint slide before the Columbia shuttle burned on re-entry. (PowerPoint Does Rocket Science--and Better Techniques for Technical Reports)
Although inconsequential by comparison, massive crowding, low ceilings creating a cloud of noise, and awful or completely missing signage at Heathrow meant that hundreds of people in line for dozens of flights could not hear agents shouting. They were trying, in vain, to pull people who were at risk for missing flights from lines that started well outside the terminal building. While waiting, we asked ourselves numerous times: “What did she say, was that about Dulles or Dallas?” It should have been obvious to on-site managers that shouting indecipherable instructions in an overcrowded, international airport was going to fail, especially while competing with booming public address system broadcasts of clear but useless instructions about unattended bags.
We received repeated false assurances not to worry because there was a back-up plan -- to supplement the verbal communications with written signage. “We will come around with boards.” But just like the inaudible audio, when agents did finally mingle with the crowd, they were hugging the signs rather than raising them overhead. For different reasons, in instances, people with important information could not get it to people who needed it. The information was known, the users were known; they never connected.
Working around these communications failures, based on experience (always the genesis of workarounds), we managed to get into a short line to a check-in. But when it was our turn, the agent announced that she was closed, I flashed to a surreal scene decades ago in the Soviet visa office in Washington, DC. As I would talk to a consular office behind a glass partition, sliding papers and passports through a little metal tray. If a discussion was not to the Consular officers liking, s/he cut it off. Abruptly, by pulling a curtain down over the window. Conversation done. In this instance at Heathrow, again, information black out.
Despite these obstacles, we finally did make communications work. We told an agent, she found a supervisor, the supervisor phoned the gate, reached someone, received the correction information; they had just closed the gate. By definition, workarounds always come back at some point to rejoin the system that they were avoiding. They are detours, not replacements.
Next stop, customer service for rebooking and a place to stay overnight. Another system, equally brittle, equally on the edge of collapse, and also intertwined and codependent. As required by policy (a system of words), we were told that rebooking is done through a consortium of hotels that service the airlines. Booking a place myself was not an option – well, not an authorized option. She logged into the booking system. She emailed the system operators. No result from either. She emailed again.
While doing all this she was also regularly reaching behind her monitor into a tangle of cables. Since we had more than enough time together (way more!), I learned that this was to switch screens on her single monitor. Again, memories of what in Russia, in the 90’s, we called ‘sneakernet.’ Before networking software enabled multiple computers to share printers, we put together a manual mechanism. If you needed to print, you walked over to the printer and manually flipped a switch that enabled the printer to receive signals from a designated computer. In Russia. In the 1990s. And at Heathrow in 2022.
Finally, she called the consortium operator. After a brief conversation, the agent at the airline consortium for hotel booking hung-up on the airline customer service agent. A digital version of the shade being pulled down in the Visa office. Sensing that I was inside a system on the edge of collapse (as I was in the USSR when it did officially collapse), I mapped a workaround. Calling various airport hotels, I found one with an opening and booked a room. While the consortium booking system was still failing to generate the needed output (a hotel reservation), I let the agent know I had a room. Long story shortened; we then bartered my reservation for her authorization to rebook – which she shared with me by – wait for it – using my phone camera to take a picture of her computer screen.
In a sense, who cares about my experience at Heathrow? We all know airports are a mess. But the mess is much more important and bigger than my experience. Our systems today are so complex, human/machine interactions are sticky rather than smooth, and workarounds are fast becoming the norm. Fragile systems on the edge of collapse are incredibly risky. It’s unsustainable for business because, especially in tight margin sectors, you can’t keep sending flights out with seats empty because passengers could not get through the airport, then occupy a second seat on an additional flight, pay for overnight accommodations, and absorb the snow-balling transaction costs of all these adjustments – in addition to reputational costs.
Transforming systems that are fragile and critical and are regularly right on the edge of gridlock is one of the most critical and daunting challenge of the Post-Pandemic Era (PPE, again). Complex systems fail in complex ways. We cannot fine-tune or tweak our way out of this challenge. We must act.
First, systems thinking cannot remain the relatively obscure discipline that it is today. We need to pull systems training out of engineering schools and build it into our primary, secondary, and executive education institutions.
Second, companies need to come clean about the various systems used to shift costs to customers. We need to understand externalities. By way of a tiny example, the time has come to pull back the curtain on the façade of “your call is very important to us.” Companies have cut staff, reduced training, and invested in technology rather than people to reduce costs – maximizing benefits to owners and imposing costs on customers. As consumers, we have all become unwitting accomplices in shifting the cost of selling ourselves the services being offered. It’s a neat trick. But especially for companies with ethical pledges, commitments to socially responsible behavior, and a dedication to mission beyond profit the externalities charade needs to end.
Third, we need to study joints. Let it be the awareness that connection points between interlocking systems are extremely fragile. And when they fail, risks multiply. And fail they will. It is these in-between spaces, the joints that hold systems together, where failure is most likely. That is as true for O-rings on the Space Shuttle as it is for tiles in your bathroom, or the white space on organizational charts.
The handwriting is scrawled across the wall tiles. We have been warned.