In today’s always-on digital economy, customers are not forgiving when they experience downtime. And networking issues cause a large portion of those IT outages. According to Uptime Institute’s 2022 Data Center Resiliency Survey, networking-related problems have been the single biggest cause of all IT service downtime incidents–regardless of severity–over the past three years.[1] Establishing redundancy, both locally and across geographic regions, is critical for achieving maximum resiliency.
Businesses incur significant costs from outages, which go beyond lost revenue. For example, the mean time to recovery (MTTR) is ramping up. According to a 2022 Opengear study, it now takes organizations an average of 11.2 hours to find and resolve a network outage, an increase of nearly two hours from a similar study in 2020.[2] And that’s just one of several categories of downtime losses. Others include lost revenue, a reduction in staff productivity and the potential for reputation damage.
In a previous blog post, my colleague Jim Poole explained why geo-redundancy is essential and needs to be a business priority in the digital era. This post will focus on how to create a geo-redundant infrastructure.
While networks are inherently at risk of system outages, you can take steps to mitigate that risk. With the increased reliance on cloud services, ensuring reliable and efficient access to these services is critical. We’ve found that local redundancy models such as high availability may be sufficient for specific use cases, but distributed digital infrastructure requires a broader approach: geo-redundancy across regions. For example, a fire or natural disaster could take out devices in a single location for an extended period; this is when geo-redundancy would provide a higher level of resiliency.
Whether your company needs one or both depends on your risk tolerance and business continuity requirements. Ask yourself this question: “If the entire data center goes down, will we be able to continue to run all our critical business functions?” If the answer is no, then balancing local redundancy with geo-redundant deployments may be the best approach to mitigating risk and optimizing performance for maximum protection. Incorporating geo-redundancy mitigates local hardware failures in a particular location within a metro and the risks associated with issues affecting all devices in a metro.
Three geo-redundancy use cases
There’s not a one-size-fits-all solution for geo-redundancy. Virtual network function (VNF) devices provide a flexible and scalable solution for connecting to multiple cloud service providers (CSPs) and routing traffic between them. There are two common use cases for geo-redundancy: simple cloud-to-cloud routing and the more robust approach of cloud-to-cloud routing plus cloud on-ramps. We’ll also discuss a third approach that combines local and geo-redundancy to ensure resiliency and availability for critical services. Enterprises can leverage these multiple local and geo-redundancy options by using various tools available on Platform Equinix®.
Cloud-to-cloud routing
Enterprises that use Network Edge VNF devices from Equinix can route traffic between two CSPs to establish cloud-to-cloud routing. Then, they can add local redundancy for enhanced resiliency by deploying an additional virtual router and redundant connections to the CSPs in the same metro. This approach protects against a single hardware failure since the redundant devices are placed in different compute planes. Enterprises also have the flexibility to create redundant virtual circuits from the VNF devices to the CSPs, depending on their level of risk tolerance for network downtime.
There are limits to this approach. Locally redundant VNFs and connections cannot protect against an outage that affects all local devices. Businesses can create additional connections between their Network Edge devices via Equinix Fabric® to connect additional devices in a different metro, creating a geo-redundant architecture. Connecting additional devices from other metros distributes the risk from a single metro outage across multiple locations, mitigating the potential downtime of any single location. This approach is becoming more popular as more businesses cannot tolerate even minimal downtime due to outages. Network Edge on Platform Equinix provides multiple options for connecting remote metros using the Equinix Fabric which allows businesses to evolve their infrastructure over time and in alignment with their business priorities.
Additional Network Edge device provides local redundancy in the metro
Cloud on-ramp with cloud-to-cloud routing
In this use case, SD-WAN traffic from branch offices is aggregated and moved into multiple clouds. Like the previous use case, it also provides for cloud-to-cloud routing. To enhance the user experience, customers typically deploy SD-WAN aggregation on-ramps in multiple locations to reduce latency and avoid inefficient routing due to having to trombone traffic.
When an outage affects an entire deployment, this architecture provides geo-redundancy by rerouting SD-WAN traffic to another metro while continuing to route multicloud traffic through the available metro. Businesses that only have one operational metro or want to add a new metro can mirror the existing metro(s) to achieve the desired level of resiliency.
Extends redundancy to another metro (DC)
Extend geo-redundancy with colocation
Many businesses already have physical equipment located in Equinix IBX® colocation data centers. They can incorporate these sites into a geo-redundant architecture by building connections to CSPs, which provides business continuity as part of a broader network resiliency initiative. By treating colocation as another geo-redundant point of service, a business is, in effect, establishing another branch in its network architecture. This is an extension of the first two use cases.
Diagram depicts colocation as part of a potential geo-redundancy architecture
Businesses that deploy digital infrastructure on Platform Equinix can use the strategies from these use cases to meet their risk tolerance and business continuity requirements. Network Edge provides a flexible and scalable solution for customers to connect to multiple cloud service providers and route traffic between them.
To learn more about using Network Edge to reduce network downtime risk, read the white paper Building a resilient infrastructure – Design considerations and deployment scenarios.
[1] “Uptime Institute’s 2022 Outage Analysis Finds Downtime Costs and Consequences Worsening as Industry Efforts to Curb Outage Frequency Fall Short,” Uptime Institute Press Release, June 8, 2022
[2] “The Many Costs of Downtime,” Opengear, September 2022