IT leaders spend a lot of time these days thinking about how to get the right infrastructure to support their organizations’ AI goals. Not all of them think about where to deploy that infrastructure, but getting the location right is just as important.
This is especially true when it comes to AI inference workloads. Not all organizations have the resources or desire to train their own AI models, which means that not all organizations have to worry about deploying their own training infrastructure. They have other options for acquiring models, including using an AI model marketplace or collaborating with partners via a federated AI approach. In contrast, inference is a required part of any enterprise AI strategy, and it’s essential to do it in the right places.
For simplicity’s sake, many organizations prefer to do inference in their existing cloud or on-premises environments. But, as we’ll explore in this blog post, both cloud and traditional on-premises data centers have their limitations. The metro edge represents the sweet spot to avoid these limitations. It offers the ideal combination of three main benefits:
- Optimized performance
- Data privacy and sovereignty controls
- Greater resource efficiency
Where is the metro edge?
The fact that inference workloads are sensitive to latency is a given. Deploying infrastructure at the digital edge helps ensure proximity between data sources and processing locations, and thus keeps latency low. But the challenge that enterprises face when deploying at the edge is that “the edge” isn’t any one specific place. There’s a hierarchy of different edges in different locations, and each one handles inference differently:
- Device edge: Inference happens directly on a connected device such as a smartphone. This could mean performing inference on the same device where the data originated, eliminating network latency altogether.
- Far edge: A business performs inference on private infrastructure that they operate within their own facility—the proverbial “server in the closet.”
- Metro edge: A business moves their inference data into a different facility in the same metro area. This facility is typically a colocation data center.
In addition to these edges, there are also cloud and wholesale data centers. These can be part of an enterprise’s wider edge-to-core-to-cloud AI data strategy, so it’s helpful to consider them in the same context as the edges listed above.
Why inference at the metro edge?
As the diagram below shows, the metro edge is the logical place for enterprises to host their inference workloads, whether that means pulling in workloads from the cloud or pushing out workloads from the far edge.
Optimized performance
The metro edge is located within the same metro area as the data sources from which it aggregates, typically enabling latency of less than 10 milliseconds. Deploying metro edge infrastructure with the right colocation partner also allows access to dedicated interconnection services for further performance benefits.
In contrast, enterprises aren’t always able to control where their cloud workloads are physically hosted. This can lead to much higher latency, which in turn means less effective inference.
While it’s true that workloads at the metro edge would experience slightly higher latency than they would at the far edge, it would still be well within acceptable ranges for inference.
Data privacy and sovereignty controls
When businesses perform inference at the far edge, they’re responsible for providing their own physical security, and this can be complex and costly. They’re running servers inside facilities that were never intended to function as data centers, so it stands to reason that they don’t have access to the same physical security controls found at a dedicated colocation data center.
Consider the example of a retailer running servers within their stores. Their entire business model revolves around shoppers coming and going as they please, so it would simply be impossible for them to control access to their servers in the same way that colocation data centers do at the metro edge. And if they can’t protect those servers, how can they possibly feel confident about the security of any sensitive datasets hosted on those servers?
On the other hand, placing inference workloads and data in the cloud could lead to loss of control. Enterprises can’t expect their cloud providers to treat their data privacy and sovereignty requirements with the same importance that they would themselves.
Compare this to the metro edge, where enterprises can deploy infrastructure that they control, in the locations that they select. If they choose to incorporate cloud services into their AI strategy, they can do so via a sovereign cloud approach by accessing cloud on-ramps that are colocated at the metro edge.
Resource efficiency
When an organization performs inference at the far edge, they have to pay the expenses associated with deploying and managing servers. This can add up for any business that operates multiple facilities in the same metro area.
Moving inference to the metro edge helps businesses cut costs by operating more efficiently. For instance, a fast-food chain with many locations in the same city could aggregate all their inference workloads in one place, instead of paying for hardware in each individual restaurant.
In addition, placing too much AI data in the cloud could lead to high egress fees. Not only do these fees drive up overall costs, but they also limit data mobility. AI workloads are inherently distributed, and businesses must be able to move their AI data between different environments—cloud or otherwise—whenever the need arises. Egress fees make this more difficult because they force businesses to calculate whether a particular move is worth the cost.
When they host their datasets at the metro edge, businesses can keep better control over their data. This could include operating a hybrid multicloud environment that limits egress fees by only moving certain datasets into the cloud, and only when they truly need to move.
Equinix helps customers capitalize on inference at the metro edge
Equinix IBX® colocation data centers are available in 74 metros across six continents, so customers looking to deploy inference at the metro edge will have no shortage of choices for where to do it. They can also take advantage of private, dedicated networking solutions such as Equinix Fabric® to move their data quickly between processing locations, and to keep that data protected in transit. Organizations that want to incorporate cloud services alongside their metro edge infrastructure can tap into our industry-leading portfolio of cloud on-ramps in many locations worldwide.
AI requires access to a vibrant ecosystem of partners and service providers, and the industry’s largest global ecosystem is at Equinix. It includes thousands of enterprises, clouds, networks, and other service providers, all of which are available to connect with directly, privately and in real time.
One recent addition to the Equinix ecosystem is Groq, a pioneer in the field of AI inference. Groq offers language processing units (LPUs), a new category of specialized AI processor that’s intended to support faster, more resource-efficient inference for text, audio and vision models. The company will use Equinix infrastructure as a gateway to help users connect to the GroqCloud™ platform, further amplifying the built-in speed and cost-efficiency.
Equinix can help customers deploy Groq solutions at the metro edge to capture all of the performance, privacy and resource-efficiency benefits described above. To learn more about the partnership between Equinix and Groq, watch our interview with theCUBE, recorded at last week’s Dell Tech World event.
To learn more about how businesses are using edge computing and hybrid multicloud networking to optimize infrastructure for emerging technologies like AI, read our white paper The proximity paradox.
