The Latency Tax: How Centralized Processing Is Costing Your AI Initiatives

To maximize real-time outcomes with AI, companies must be strategic about putting inference at the edge

Marco Zacchello
The Latency Tax: How Centralized Processing Is Costing Your AI Initiatives

As emerging AI use cases like autonomous vehicles, continuous health monitoring and real-time fraud analysis enter the mainstream, businesses increasingly need ultra-fast data processing close to where data is generated and used. Meanwhile, data privacy concerns, data sovereignty laws and other regulations are motivating them to keep data as close as possible to the source.

With the rapid growth of AI as an essential enterprise technology, companies have begun recognizing the importance of edge deployments in their overall IT architecture. Those that have relied on the cloud or an on-premises data center are experiencing the limitations of centralized processing models for latency-sensitive AI inference workloads.

If you do AI processing in the cloud:

  • You have less control over data, which can lead to compliance issues.
  • You face higher costs to transfer data in and out of the cloud, both because of egress fees and the cost of the transport network.
  • Transferring data takes too much time for latency-sensitive use cases, making real-time AI inference impossible.
  • Even if local cloud regions are available, you still have to pay egress fees to move data to a central location for training.

If you do AI processing in a centralized data center:

  • You have more data control than with cloud, but the location may not meet data residency requirements.
  • You still have higher latency and network congestion for transferring data.
  • You may have limited capacity to scale as data volumes grow.

In both cases, centralized processing models can lead to bottlenecks for AI workloads because of the distance data must travel and can drain an AI budget. Some organizations in the early stages of AI think their current cloud or data center setup is good enough, but when it’s time to move AI projects out of testing and into production, that infrastructure can undermine AI capabilities.

The best way forward is a distributed approach to AI, where some key aspects of an AI workflow happen in edge locations and others take place in a centralized data center. AI inference, fine-tuning and some domain-specific training needs to happen at the edge for the lowest latency, whereas general AI model training can be done in centralized infrastructure or in the cloud where it’s easier to scale and aggregate multiple data sources. This approach requires leveraging an interconnected hybrid infrastructure that incorporates digital hubs in edge locations. Organizations that don’t currently have an edge strategy will need one, because putting compute power closer to the data sources enables the low-latency processing that AI applications demand.

Why latency matters for AI

There are three aspects of latency to consider for AI applications:

  1. The time it takes to move data from the object generating that data to the inference node
  2. The time it takes the inference node to process the data with the trained model
  3. The time it takes the inference node to reply with an action on a device or to provide a report

These data transfers might take only a few milliseconds if you’re in an edge location, but if you must backhaul data to a central location, the latency could be detrimental.

Figure 1: Using a centralized data center can increase costs and time of data transfers

Low latency is especially important for AI inference because it can negatively affect user experiences, business costs, and even human life and safety. Consider the following examples:

  • Autonomous vehicles need to be able to respond instantly when sensor data indicates there’s a pedestrian ahead to prevent accidents.
  • Connected ambulance systems use network edge nodes to process patient vitals and diagnostic data in real time during transport, which helps hospitals prepare for incoming emergencies.
  • Utilities and energy production companies need to respond quickly to changing weather and emergencies for safety reasons.
  • In industrial automation, AI models at the edge analyze sensor and camera data instantly, enabling immediate responses to critical events such as equipment anomalies, product defects, or sudden environmental changes like temperature spikes.

In all these cases, organizations can’t afford the latency that centralized or cloud-based processing entails.

AI inference needs to happen at the edge

Because AI training involves big volumes of data and is largely not latency constrained, centralized processing makes sense. Centralized infrastructure is better equipped to handle the scale needed, and training can pause while waiting to receive fresh data.

AI inference, on the other hand, is triggered by fresh data being sent and received from devices. An inference node needs to react to that data quickly, so there’s no time for it to travel back to a central location for processing. If the inference node is close to the data, it can trigger a real-time action. With inference nodes in various edge locations across enterprise infrastructure, companies can improve service availability and deploy domain-specific AI models that work with a narrower dataset and can lead to faster training and lower computing requirements. Examples include medical imaging analytics or disease diagnosis assistance. Edge infrastructure also enables location-specific services like real-time video analytics for airport security.

Edge AI isn’t just about latency, though; there are also cost and privacy benefits. For the connected ambulances example above, processing data locally at the network edge can significantly reduce latency compared to cloud-only solutions, allowing for real-time alerts such as stroke detection that can save vital minutes in emergency care. Additionally, local data processing minimizes bandwidth usage and enhances data privacy by transmitting only essential summaries. This safeguards sensitive patient information while maintaining operational efficiency. Likewise, in industrial automation, edge AI can eliminate delays from cloud-based processing and ensure that decisions are made in real time. This can help enhance operational efficiency, reduce downtime and improve overall safety and product quality in manufacturing and industrial environments.

The role of network technology in edge computing for AI

New network technologies like remote direct memory access (RDMA) are emerging as game-changers by addressing the challenges of “long and fat” networks, in other words, those with high bandwidth and latency over extended distances. RDMA enables direct memory access between systems without involving the CPU, significantly reducing latency and increasing throughput. This is crucial for AI workloads that require rapid, large-scale data transfers between edge inference nodes and centralized training clusters.

RDMA and edge computing can work together to minimize latency for AI. RDMA speeds up data transfer and aggregation, and edge infrastructure shortens the distance it needs to travel for processing. RDMA thus helps with the adoption of a distributed approach to AI by enabling strategic distribution of data between edge and core infrastructure. From a business perspective, adopting such solutions not only accelerates AI deployment cycles but also enhances operational efficiency, enabling real-time insights and faster innovation at scale.

An interconnected edge infrastructure

In the era of data-driven intelligence, edge computing is a must. AI success is ultimately about collaboration between centralized training and local inference. To achieve this, you need an interconnected edge infrastructure in the right locations, near your data sources and end users, and connected to clouds, SaaS providers and other partners in your AI ecosystem.

 

Figure 2: Interconnected digital hubs at the edge

With 270+ data centers in 76 markets around the world, Equinix has the global reach to support your edge deployments for AI. In our high-performance, AI-ready data centers, you can deploy flexible infrastructure where you need it while optimizing costs and maintaining regulatory compliance.

To learn more about the importance of edge computing to reduce network latency for AI, download our white paper Where edge meets AI opportunity.

 

Subscribe to the Equinix Blog