The Infrastructure Behind AI

What Is Retrieval-Augmented Generation (RAG), and Where Should You Do It?

Enterprises need some help getting LLMs to tap into their private data, and RAG delivers

Glenn Dekhayser
What Is Retrieval-Augmented Generation (RAG), and Where Should You Do It?

For years now, we’ve been hearing that “data is the new oil.” It’s practically become a cliché. But even oil needs to be refined and shipped before it can be used as fuel. So, what will it take to turn data into fuel that drives business value?

New large language models (LLMs) promise to completely redefine how enterprises draw value from their data. These LLMs were trained on much larger datasets than organizations would have been able to handle in the past; therefore, they contain a much wider array of insights than older models. But while the potential of LLMs is undeniable, there’s still the small matter of how to apply them to real enterprise use cases, and this is often easier said than done.

LLMs are all about understanding a general knowledge base, in order to simulate the way humans communicate. Because they’re focused on general knowledge, specific knowledge—like the insights and context needed for enterprise AI use cases—often falls through the cracks. They’re also trained on static datasets, which means they’re only accurate up to a certain point in time. LLMs cannot access real-time data—at least not without some help.

This is where retrieval-augmented generation (RAG) comes into play. RAG is a technique for optimizing AI inference to help LLMs generate more accurate results. RAG systems serve as the bridge that connects two different types of data, to optimize the value of both:

  • Public LLM training datasets
  • Private enterprise datasets

Enterprise AI relies on RAG

As the name suggests, a RAG system augments models by retrieving the relevant information needed to generate an accurate response to a prompt. Instead of retraining a model, RAG helps point the model toward important data that wasn’t included in the original training dataset, either because the data is private or because it didn’t exist yet.

RAG systems can use APIs or live data queries to track down the real-time (or near real-time) information that’s relevant to a particular prompt. Essentially, any time an end user asks the model a question, RAG provides the hints and supporting data that the model needs to work out an accurate answer for itself. RAG can play a key role in an enterprise’s private AI strategy, as it allows for secure inference on proprietary datasets running on GPU-enabled compute under the enterprise’s control. This increases the accuracy of the insights that LLMs provide, without putting sensitive internal data at risk of unauthorized exposure.

RAG is one example of how enterprises can customize and build upon the pretrained models they acquire from AI model marketplaces. One other method is fine-tuning—essentially, performing additional training for models using private data. While fine-tuning is certainly helpful, it can also be complex and resource-intensive, so it may not be practical in every instance. Further, if the data is subject to a retention policy, embedding that data into an LLM via fine-tuning could be problematic.

Enterprises can also use agentic AI workflows that pull real-time data from various sources, helping LLMs make informed decisions and perform actions automatically. Agentic AI will undoubtedly be a major step forward in the development of enterprise AI, but there are issues that must be addressed first. For instance, enterprise leaders need to ensure that AI agents operating without human oversight are still able to meet data privacy and sovereignty requirements. Also, agentic AI requires a constant stream of data that’s accurate, timely and relevant. This means that the emergence of agentic AI in the enterprise will further underscore the importance of RAG.

RAG will inevitably become the foundation of most enterprise AI strategies, along with agentic AI. A RAG-ready data pipeline is one of the most important prerequisites that an enterprise must meet in order to enable AI success, as data must go through a robust set of processes to ensure accuracy, relevance, and proper formatting prior to being tokenized and embedded into RAG databases.

How does RAG fit into the future of enterprise AI?

In an ideal approach to enterprise AI, an employee would be able to ask a direct question about any aspect of the business and get the best answer—pulled together from every piece of corporate data, static or streamed—that the specific employee is entitled to depending on their granted permissions and other governance controls. For instance, a salesperson should be able to ask for a summary of their biggest account and quickly get an accurate, holistic picture of all the opportunities and insights for that customer captured globally, across every possible system and data store—but not for other accounts they don’t manage.

Achieving this outcome would require the RAG infrastructure to query the organization’s entire knowledge base across all its various applications, including both static and dynamic datasets. Then, it would need to proactively apply data privacy and sovereignty controls. This means that it would have to filter out any information in real time that a particular employee isn’t entitled to, based on their job role and location. Most of the tools needed to realize this dream already exist, but success will depend on a flexible, interconnected architecture, located within an infrastructure platform that optimizes optionality, performance, cost, and proximity to all the points of the enterprise IT ecosystem.

As enterprises work to move toward that ambitious AI future, distributed RAG infrastructure will only continue to grow in importance. It’s no surprise that thousands of new tools and applications that cover different aspects of RAG have been released in recent years, including both commercial and open-source solutions. It’s not unreasonable to believe that RAG infrastructure will one day become as critical to the enterprise as ERP tools and email are today.

What should a foundational infrastructure platform for RAG look like?

We’ve established why enterprises need RAG. Now, let’s consider how and where they should do it. Enterprises will need to build a foundational infrastructure platform for distributed AI inference, and this will drive the need for RAG to access, tokenize and embed data from across that platform. On top of this RAG infrastructure platform will sit a robust data platform that’s designed for classification, governance, protection, security and performance, at scale and distributed across the entire enterprise.

The infrastructure platform should also include each of the following characteristics:

  • Low-latency connectivity: RAG systems need to be able to access the right data at the right time. This means the infrastructure platform needs to be tightly coupled to the data platform. RAG systems should be able to access enterprise data via low-latency connectivity, no matter where the data is located. This connectivity should extend all the way from the cloud to the edge, and everywhere in between.
  • Infrastructure flexibility: One key aspect of RAG is that it pulls from dynamic data sources. Therefore, the infrastructure that supports RAG would need to scale up or down with agility to keep up with the changing nature of the data itself. In order to do this, enterprises need to tap into flexible resources such as cloud infrastructure and virtual connections.
  • Secure data access: Since RAG systems often work with sensitive datasets, it’s essential that enterprises build their infrastructure platform with data privacy and control in mind. These systems should be able to access any data they need, without the risk of the organization losing custody of that data or exposing it to unauthorized users. Storing RAG data on storage equipment that the enterprise controls, in locations it can access, will be at the top of the requirements lists moving forward.

Platform Equinix® provides everything enterprises need to build foundational infrastructure for RAG. Our customers can deploy a dedicated storage environment for their private data, known as an Authoritative Data Core, and surround it with all the cloud and edge infrastructure their AI strategy demands. This allows them to use their data the way they need to while always maintaining control over it.

Customers can also take advantage of our robust partner ecosystem, which includes all major cloud providers and GPU as a Service providers. Using Equinix Fabric®, they can create scalable connections with all their ecosystem partners and between their own infrastructure in different locations. Finally, they can deploy in Equinix IBX® colocation data centers in 74 metros across six continents to ensure the global reach and proximity to data sources that their AI strategy requires.

High-performance data centers from Equinix help enterprises future-proof their operations, providing all the advanced capabilities that legacy on-premises data centers don’t. Enabling AI-ready infrastructure is just one example of the benefits that high-performance data centers can provide. Learn more about how to shape your future with the right data infrastructure: read the infographic.

 

 

 

アバター画像
Glenn Dekhayser Global Principal, Global Solutions Architects
Subscribe to the Equinix Blog