Check out highlights from the 2024 Metis Strategy Summit | Read more

The recent system failures at Southwest Airlines and the Federal Aviation Administration caused major disruptions for travelers, pilots, and cabin crew across the country. It also underscored the importance of prioritizing technology modernization initiatives, data integration, and the management of technical debt as the aviation industry races to make updates that many consider long overdue. This article will give a brief overview of both incidents and share lessons technology leaders can take to their own organizations.

Southwest Airlines

In late December, a winter storm and frigid temperatures impacted airlines across the country. While many airlines bounced back relatively quickly, Southwest did not. Cancellations mounted and the company was unable to address them in a timely, automated way. 

Southwest’s flight and crew scheduling is managed by a mainframe-based software that was built decades ago and is nearing the end of its life, according to the airline. When the system is overwhelmed, employees have to resort to manual processes. As the backlogs grew in December, “there just was not enough time in the day to work through the manual solutions,” Southwest COO Andrew Watterson said. By December 25, the Southwest team decided “the only way to pull the airline’s operations back from the brink would be to cancel even more flights: around two-thirds of its schedule for several days.” Nearly 17,000 flights were canceled, disrupting the lives of about two million customers.

The company is working with GE Digital to add new functions to its mainframe-based software, Crew Optimization (formerly known as SkySolver), to improve the flight and crew scheduling process. GE Digital owns the Crew Optimization technology. Bob Jordan, Southwest Airlines’ CEO, said the technology and processes worked as designed but “they just were all hit by overwhelming volume.” A GE spokesperson told the Wall Street Journal that its software isn’t an end-to-end solution, but rather a backend algorithm that airlines can supplement with other software to manage disruptions. Southwest and GE Digital are working together to develop a new release for the software to address past problems to reduce the need to do so manually.  

Unions at Southwest have been urging the company to modernize the antiquated scheduling technology. “We’ve been harping on them since 2015-ish every year,” Southwest pilots union vice president Mike Santoro told CNN. In 2022, the Southwest flight attendants union wrote a letter to management prioritizing “modernization of the antiquated reserve system” and “improved communication tools to alleviate long scheduling hold times” over pay increases. Watterson said that Southwest was working through multi-year system upgrades, and had focused on maintenance and group operations ahead of crew-scheduling updates.

As a result of the disruptions, regulators and lawmakers have called for investigations and penalties against the airline. Additionally, the company’s board has created an operations review committee and the company has committed more than $1 billion of its annual operating budget to maintaining and upgrading IT systems as part of a five-year strategic plan. The events have cost Southwest Airlines an estimated $725 to $825 million, and the ripple effects continue to be felt. 

Federal Aviation Administration

Just weeks after the Southwest meltdown, the FAA’s system experienced an outage that led to thousands more travelers experiencing flight delays and cancellations. Like Southwest, the FAA’s outage originated from systems scheduled for upgrades. The affected system, Notice to Air Missions (NOTAMs), is a critical tool for alerting pilots about conditions that could impact flight safety and for real-time information on flight hazards and restrictions. Pilots are required to consult NOTAMs before every flight.  

Due to safety concerns and to address the outage, the FAA grounded departures nationwide for the first time since 9/11. “Today’s FAA catastrophic system failure is a clear sign that America’s transportation network desperately needs significant upgrades,” said Geoff Freeman, president and CEO of the U.S. Travel Association. “Americans deserve an end-to-end travel experience that is seamless and secure. And our nation’s economy depends on a best-in-class air travel system.”

The FAA has identified a damaged database file on systems scheduled for upgrades as the cause behind its system outage, and found no evidence of a cyberattack. Investigations are ongoing to prevent any similar disruptions to travelers in the future. 

New tools are being developed, some originating from startups, to modernize and automate processes and systems in the airline industry that are manual, siloed, and outdated. Executives at a number of major airlines have reaffirmed their commitments to investing in technology modernization and operational infrastructure during their January quarterly earnings calls.

Lessons for technology leaders

The examples above serve as cautionary tales on the potential dangers of not addressing needed system upgrades in a timely manner. Technology leaders can keep the following points in mind as they build organizational resilience amid a fast-changing technology landscape:

Don’t put modernization efforts on the back burner. CIOs and their organizations are constantly balancing a shifting portfolio of initiatives. Challenges at Southwest and the FAA illustrate the heightened risk and serious consequences of waiting to make critical upgrades or letting technical debt pile up. If your team doesn’t have a strategy for chipping away at that technical debt, it’s time to address it.  

Connect the dots between internal systems, employees, and customers. Long-reliable legacy systems may be an afterthought for many organizations (until they stop working, that is). Incorporate maintenance and upgrades into short-term and long-term business and technology strategies with an emphasis on how these systems ultimately affect employee and customer outcomes.  

Have a backup plan and prepare for the worst. Develop and regularly test response plans with teams to reduce risk and ensure the organization is prepared to navigate potential mishaps.  

Leverage cloud-based systems where appropriate. Partnerships with cloud providers are expected to help airlines improve their technologies. A shift to cloud solutions is no simple or risk-free task, and successful implementations go well beyond simply installing the technology, but strategically scaling these technologies can create greater operational agility, help automate processes, and make data integration more seamless and secure. 

Continue to enable real-time data and communication that can take place across teams and organizations. Where possible, eliminate silos that can slow and reduce the quality of data sharing and, at worst, bring operations to a halt. Ensuring accurate and accessible data enterprise-wide is not a small task and often requires a robust data strategy to execute effectively, but its benefits can extend well beyond helping with crisis response. 

Continue to gather and listen to feedback. For executives especially, careful listening and communication are key to empowering teams, creating an effective digital experience, and ensuring they have the tools needed to do their jobs.