Four vendors. One outage. Nobody responsible. A detailed account of how fragmented carrier accountability creates real operational risk, and what governance looks like instead.
The setup
A multi-site regional healthcare organization. 80 locations across four states. Connectivity managed through a combination of a primary MPLS provider, a secondary broadband provider for failover, a wireless backup solution, and a separate provider handling voice services. Each relationship managed independently. Each with its own contract, its own support contacts, its own escalation process.
On a Tuesday morning, 23 locations lost connectivity. The primary MPLS circuits went down across a regional segment. Failover didn't activate at most sites. Clinicians couldn't access EHR systems. Scheduling went down. Two sites rerouted to paper processes. The outage lasted 6 hours.
The question that consumed the next two weeks wasn't "how do we fix this." It was "whose fault is this."
What fragmented accountability looks like in a crisis
The MPLS provider acknowledged the circuit outage and began restoration. Their SLA had been breached. A credit was coming.
The broadband failover provider reported that their circuits had been functioning normally throughout the event. They were correct. Their circuits were up.
The reason failover hadn't activated was a configuration issue in the routing policy, implemented during a network change three months earlier. That change had been managed by the organization's internal IT team, in coordination with a network consulting firm they'd engaged separately for the project.
The wireless backup solution had activated at some sites but not others. The provider cited equipment configuration at the site level. The equipment had been installed by a third-party integrator. The integrator's contract had ended six months prior.
Four vendors. A consulting firm. An integrator whose contract had expired. An internal IT team. All with partial ownership of the system that failed to protect 23 locations from a six-hour outage.
Nobody was responsible for how it all worked together.
The real cost wasn't the outage
The SLA credit from the primary provider covered a month of circuit costs. That was the easy part. The harder costs were the operational disruption, the clinical workflow impact, and the post-incident work, which took eight weeks of IT time to fully diagnose, remediate, and document.
More importantly, the organization emerged from the incident with a clearer picture of a problem that had existed for years: they had built a complex, multi-vendor connectivity environment without building the governance structure to manage it as a system. Each piece had an owner. The system had none.
What governance over carrier environments actually requires
Managing carrier relationships is not the same as managing a carrier environment. Relationship management is transactional, contracts, invoices, support tickets, renewal negotiations. Environmental governance is systemic. How do all of the components work together, who is responsible for the system performing as designed, and what is the process for ensuring that changes to any component don't create unintended failures elsewhere.
The distinction has several practical implications:
Change management must be cross-vendor. When a routing policy changes, someone needs to assess whether that change affects how failover, redundancy, and backup systems behave, across all providers, not just the one whose equipment is being touched. In the case study above, no one played that role.
Failover testing must be regular and documented. Failover configurations degrade silently. A configuration that worked at implementation may not work six months later, after a network change or a firmware update. Regular testing, with documentation of what was tested, what worked, and what was remediated, is the only way to know your resilience is real.
Escalation paths must be pre-defined and practiced. In a multi-vendor outage, who calls who, in what order, with what authority to make decisions? If this is figured out during the incident, you're already behind. The organizations that recover fastest are the ones that have done this work in advance.
What accountability without a single owner creates
The healthcare organization in this case study had good vendors. Their MPLS provider was responsive. Their failover connectivity was technically sound. Their wireless backup solution was capable. None of that mattered when the system failed, because no one was accountable for how the system performed as a whole.
Single-owner accountability doesn't mean replacing vendors. It means having one party responsible for ensuring that all the components work together, that changes are coordinated, that resilience is tested, that when something goes wrong, there is one place to call that has the authority and the knowledge to own the resolution, regardless of where the failure originated.
That's the difference between vendor management and system governance. And it's the difference between a six-hour outage and a thirty-minute one.