Generative AI has been making waves since the release of ChatGPT last year, and it has the potential to transform several industries. But what’s so different about it, and what does it mean for data centers?
Increasing use of AI will make new demands of IT infrastructure, so organizations need to be prepared. However, according to the Equinix 2023 Global Tech Trends Survey (GTTS), 42% of IT leaders said they’re not confident in their infrastructure’s ability to accommodate the growing use of AI.
Already, generative AI has influenced the architecture of AI clusters, requiring a larger neural network—which means more hardware—as well as better compute fabric and larger data sets. These factors lead to high power consumption and the need for more efficient cooling and better networking infrastructure. In this blog post, we’ll look at what sets generative AI apart and what it will mean for data centers as we move into the future.
Generative AI versus traditional AI: What’s changed?
First, let’s talk about what makes generative AI different from traditional AI:
- While traditional AI classifies data and identifies patterns within data sets to make predictions, generative AI goes beyond mere pattern recognition to create new content based on the patterns it has seen.
- Typically, generative AI models are represented by much larger neural networks that contain billions or even trillions of parameters.
- The original prompt you put into the AI engine is highly important in delivering good results.
- Generative AI query response times can be slower (in the order of multiple seconds) compared to that of traditional AI queries (with sub-second response times) because of the extra processing and larger data sets.
- Generative AI involves much larger AI training infrastructure and higher power consumption, thus requiring denser server racks and advanced cooling techniques.
- In many use cases, subject matter experts can interact directly with generative AI systems instead of going through data scientists. Data scientists are still required for foundational model customization.
- Because of the high computation and infrastructure requirements to create AI models from scratch, companies are starting to share AI models through Model as a Service and open-source AI model marketplaces.
What are the key challenges of generative AI?
For the remainder of this post, we’ll focus on the biggest areas of concern with generative AI and the implications for data centers. First, let’s examine some of the challenges, which fall into roughly three categories:
- Data challenges
There are lots of potential ethical and legal issues surrounding the types of data accessed for generative AI models. Generative AI crawls and collects data from all over the internet. It can include public data, private data you own or purchase, and third-party data. Simply getting access to all that data is one challenge, and protecting privacy and complying with local regulations adds another dimension of complexity. Organizations need excellent data governance to properly steward data through the generative AI lifecycle.
It takes a lot of resources to train and create generative AI models, and most organizations don’t have the capability to create their own. Increasingly, companies are using generative AI foundational models created by governments, AI vendors or hyperscalers, or models available from open-source AI model marketplaces—and then customizing them using their private data.
Some organizations are fine with uploading their private data into the clouds. However, many others aren’t and want to bring the foundational AI models to their private infrastructure instead. When using models created by others as a starting point, it’s paramount to know the lineage of those models to ensure that the data used to train them isn’t biased and that you can comply with data privacy regulations.
Data accuracy and relevancy are also factors. If you use inaccurate data or data that’s not fit for purpose, it may reduce the accuracy of the AI model and the effectiveness of AI outcomes.
Generative AI requires high-performance computing systems designed to run large-scale, intensive workloads. Such systems demand the most powerful processors, memory and specialized hardware like GPUs or custom chips that are available today—as well as powerful physical networking underneath. This is the only way these AI workloads can reach optimal performance.
Generative AI development workloads are bandwidth sensitive, and generative AI production workloads are throughput sensitive. Let’s look at some key factors that influence the performance of these two workloads:
- Development workloads: A generative AI development, or training, workload requires a lot of compute, memory, networking and storage resources. Access to latest GPU technology is critical to reduce the overall training time. The GPU interconnection architecture, overhead due to GPU virtualization, and multi-user contention for shared resources (e.g., compute to storage networking) can all affect the model training time. In many use cases, customers choose a foundational model that has been created by others and customize it using their private data. In these situations, it’s critical to choose the right foundational model as the starting point because this will determine the required resources (for performance reasons, people want to keep the generative AI model fully in memory) and the overall model customization time.
- Production workloads: In many generative AI use cases, there can be millions of users simultaneously accessing the generative AI model as part of AI inference. To scale and avoid bottlenecks, generative AI solution providers are deploying their AI model in multiple regions, to localize the traffic and scale the throughput of their overall solution. Currently, most generative AI solutions are human-level latency sensitive unlike many AI solutions that need to be machine-level latency sensitive (e.g., credit card fraud detection). As generative AI solutions get utilized more by machines, latency to external data sources will become an important consideration (e.g., access to weather data, traffic data, etc.) as part of the AI inference process.
It’s also worth noting that because the infrastructure for both development and production is complex and expensive, many organizations use the same infrastructure for both, which can create a throughput problem.
It’s well known that AI models can drive higher emissions than conventional IT workloads—and generative AI consumes even more power than traditional AI. Since power usage and cooling both have sustainability implications, enterprises and the data center industry also need to think carefully about how to do AI sustainably.
What are generative AI’s implications for data centers?
Generative AI workloads have several implications for how we design data center architecture—everything from where you put your infrastructure, to how the buildings are constructed, to connectivity options and more.
Let’s look at five key areas:
Data center location
Development workloads: Generative AI development workloads are very power hungry, so it’s best to build them in locations that can provide low-cost power. Typically, development workloads aren’t latency sensitive, so they don’t have to be in highly populated areas. Furthermore, you’ll want to host them in locations where you can take advantage of “free outside air” cooling techniques.
Production workloads: On the other hand, it makes sense to place generative AI production workloads in edge locations close to where the data is being generated. Thus, one might want to deploy AI inference clusters in multiple regions to reduce data backhauling to a central location. Depending upon the number of users simultaneously accessing the generative AI model, you might need to deploy a large AI inferencing cluster that requires a lot of power. Thus, it’s important to deploy these production systems in data centers that can support high power requirements.
Country of origin: To satisfy data residency and compliance regulations, many organizations need to deploy their AI production systems in multiple countries. You can simplify and streamline your data center deployment processes by working with a global data center vendor with locations around the world.
Data center construction and operations
Production AI workloads require high-availability data centers, much like other IT workloads. However, data center requirements for AI development workloads are unique in the following ways:
- Power requirements: Creating generative AI models requires large and power-hungry GPU clusters. Generative AI training workloads can consume multiple megawatts of power. Thus, data centers have to provide power circuits that can carry more power to the racks. Similarly, many providers are moving to power bus bars instead of cables to increase the efficiency of the power circuits.
- Liquid cooling: Since generative AI GPU racks can consume more than 30KW of power per rack, traditional air cooling isn’t efficient enough. Thus, we need to build data centers that have support for enhanced cooling techniques such as liquid cooling.
- Availability: Since AI training workloads usually take periodic checkpoints, they can tolerate cluster or data center downtime. That is, one can always go to the previous checkpoint and restart their training job from that point. Thus, there’s opportunity to consider different redundancy models for AI training data centers. This can help to reduce the overall cost of the data center.
Data center connectivity
To create generative AI models (AI development), you need high-speed access to many external data sources. Subsequently, when using a generative AI model for production, you need high-bandwidth connectivity to bring in multi-modal input data (e.g., video, pictures) and low-latency connectivity to external data providers (e.g., live weather, stock market, traffic, etc.). Thus, it’s important to host generative AI workloads at a data center that provides high-speed and secure connectivity to multiple network providers for bringing in traffic from the edge (5G, Wi-Fi, low power, MPLS, etc.).
You also need high-speed, secure connectivity to data sources spread across clouds, data brokers and other enterprises to get access to external data sources for improving the accuracy of the models, and for low-latency AI production. Many clouds charge for data egress via a private connection at much lower cost points than egressing data over the public internet. Thus, it’s important for data centers to be an approved private connectivity provider in order to reduce cloud data egress costs.
Data center sustainability
Since generative AI workloads are very power hungry, data center providers need to source their power from sustainable energy sources. Increasingly, there will be pressure on AI solution providers from governments and industry watchdogs to host their solutions in eco-friendly data centers with a low power usage effectiveness (PUE) number. Data center providers will need to leverage AI technology to optimize and customize the operation of their data centers with respect to the number of fans, chillers and so forth. Finally, data center providers will also need to publish periodic sustainability reports to help customers optimize power consumption by their IT infrastructure.
Data center privacy and security
Many organizations want to keep full control over their data for privacy and competitive reasons. In addition to software and data-level security, physical security of their infrastructure is also very important. Thus, data center vendors need to provide private cages that can be accessed only by the customer, with 24/7 video monitoring. In many cases, customers want to know if the data center provider satisfies government infrastructure security regulations for compliance reasons.
Work with a leading data center provider
We believe Equinix is a great place to put your infrastructure for generative AI. As the world’s digital infrastructure company®, Equinix is invested in powering innovation for today and tomorrow. With a global footprint of 240+ Equinix IBX® data centers in 70+ markets, we provide proximity to key data sources—including 3,000+ cloud and IT services, 2,100 network services and 450+ content and digital media providers. Not to mention our digital infrastructure services like Network Edge, which enables you to modernize your network within minutes by deploying virtual network functions (VNFs) across Equinix metros, and Equinix Fabric®, which delivers software-defined interconnection services.
Equinix data centers are also designed for sustainability. Our facilities are covered by 96% renewable energy, with a commitment to reach 100% renewable energy coverage by 2030. And we’re optimizing water usage and exploring advanced liquid cooling techniques. These and other aspects of our future-first sustainability strategy make IBX data centers a wise place to put your AI infrastructure.
To learn more about how organizations are thinking about infrastructure for AI, download the Equinix 2023 Global Tech Trends Survey.