Network, Heal Thyself: On Self-Healing Networks

How automation, AI and ML are delivering more intelligent networks that can predict, repair and maintain themselves

Marco Zacchello
Network, Heal Thyself: On Self-Healing Networks

What if networks could heal themselves? Network outages and quality of service issues can have substantial costs and negative effects on a business: lost revenue, lost productivity, poor customer and employee experiences and more. As the tech industry evolves toward more intelligent infrastructure capabilities, we’re seeing the integration of AI, machine learning (ML), data analytics and automation into many areas of the technology stack—and networks are no exception.

The promise of self-healing networks is beginning to come to fruition, as more advanced self-healing properties make it possible for networks to identify, remediate and predict problems without human intervention. The goal of self-healing networks—as with all intelligent infrastructure solutions—is to simplify the operation of the network, both for those who deploy it and for end users and customers. This, in turn, frees employees to focus on more high-value tasks.

In this blog post, we’ll define self-healing networks, explore their advantages and consider some examples of where self-healing networks show the most promise.

What is a self-healing network? Is it something new?

Self-healing networks are networks with built-in properties to protect against failures by predicting problems, providing remediation or workarounds, supporting recovery and preventing future incidents. Network failures can range from complete loss of connectivity to smaller quality issues that can nonetheless have a big impact on business applications and outcomes.

Self-healing networks require visibility into what’s going on across the network—including connectivity, quality of service, application performance and even business performance. And the self-healing properties apply throughout the network lifecycle, from deployment to maintaining the network, and even to decommissioning it.

The idea of self-healing networks isn’t entirely new, but the paradigm shift to software-defined infrastructure has transformed what’s possible with network automation. In the past, we mostly had to identify and address network problems manually. Basic redundancy was the primary backup approach in the event of network downtime.

A few decades ago, there were some early attempts to automate networks; however, ML and automation technologies weren’t yet advanced enough. As cloud providers introduced more as-a-service offerings, the whole industry shifted toward software-defined infrastructure and Infrastructure as Code (IaC) models. We began to see more integration of automation, AI, ML and software-defined everything to provide predictive maintenance and test network changes in advance. Today, we have the software and capacity to deliver on the promise of self-healing networks—and more and more network services vendors are offering these capabilities on their gear.

How do self-healing networks work?

To put it simply, a self-healing network relies on a collection of real-time and historical data on the health and performance of the network and uses it to gain visibility into the network’s operations in order to automate remediation of network problems. Once data is collected, AI and ML techniques are used to analyze it. And we can create a model of the network—a digital twin—where we can test changes before applying them to the actual physical network. Once there’s a baseline for normal network traffic, we can set performance thresholds for continuous network monitoring. Finally, self-healing networks can automatically remediate issues by rerouting traffic, changing configurations or other forms of traffic engineering.

Today, we have the computing power and AI algorithms advanced enough to enable prediction and transform network operations in ways unimaginable a few decades ago. We can anticipate and correct against failures before they happen and know with confidence what the results of new configurations will be.

What’s so remarkable about self-healing networks?

The advantages of self-healing networks are clear:

  • Better network availability and optimized performance
  • Prevention of network outages
  • Faster remediation of network failures or quality issues
  • Easier operation of networks—for both networking providers and their customers
  • More automation and less human intervention
  • More opportunities for workers to focus on high-value tasks and innovation instead of manual interventions
  • Better customer experiences
  • Better application performance support

And perhaps one of the most important benefits of self-healing networks is the way they empower organizations to focus on their strategic business objectives instead of being in the weeds on network operations. With self-healing networks, service providers can match companies’ business service-level agreements (SLAs) with the right technical services and configurations. In other words, in a more intelligent infrastructure, network operation is coupled directly with the intentions and priorities of the business.

Low latency is critical across many industries, and perhaps most of all in cases where human health and safety are at risk. For example, consider autonomous vehicles. In order track vulnerable road users (VRUs)—the industry term for cyclists and pedestrians—and to avoid collisions and deaths, self-driving cars need to be able to react quickly even if VRUs or other vehicles do something unexpected. They rely on interaction between vehicles and other external entities like road infrastructure sensors called roadside units (RSU), edge devices (multi-access edge compute) and clouds. To make safe decisions in real time and respond very quickly to changing road conditions, driverless cars require low-latency communication on the underpinning network connecting the car with RSUs, MEC devices and the VRUs themselves (when they have a device or app that’s capable of communicating with the vehicle). The network should be able to predict if the latency service levels required by the autonomous car to keep track of VRUs can be met through predictive quality of service or self-healing, where possible, and communicate the prediction to the car. This way, the car, according to the status of the network, knows to either maintain the autonomous driving or give back control to the driver because the right conditions are not met.

Another good example is in healthcare emergency services like 9-1-1, where the network must be reliable and always stay online. In these use cases, self-healing networks can deliver essential reliability for business-critical applications.

Network innovation on Platform Equinix

At Equinix, we’re continuously working to innovate and provide a more efficient, robust network infrastructure for our customers and partners. That means we’re enabling more automation across both digital and physical infrastructure and services. We’re exposing more of our services as APIs, providing Infrastructure as Code to meet the evolving needs of our customers and offering greater visibility into the full infrastructure stack.

Self-healing networks are one among many actions we’re taking at Equinix to leverage state-of-the-art technologies to deliver smarter infrastructure solutions. As we speed toward the future, we’re moving along the continuum from automated to insight-driven to truly intelligent infrastructure. As self-healing networks become a reality, they have the potential to automatically translate your business objectives into the right network configurations.

To learn more about infrastructure transformation and the future of on-demand networking, download the Platform Equinix vision paper.

 

Subscribe to the Equinix Blog