The myriad of data that enterprises have at their disposal is unprecedented. Data has become the lifeblood and currency of the digital economy. In turn, artificial intelligence (AI) and machine learning (ML) have matured as powerful tools for processing that data to produce insights and drive business outcomes. Companies are using these tools to gain competitive advantage, reduce costs, grow their business and improve efficiency. Data, AI/ML and the high-performance computing (HPC) systems they require have become central to making strong business decisions.
Computer hardware and chip manufacturers have kept pace with the need for data-driven competitiveness by improving their hardware to meet the performance requirements of AI/ML. AI model training requires a lot of processing power, and we have extraordinary processing capabilities today compared to even just a year ago. Moore’s Law means modern computer processors pack more efficiency and computing power into a single unit, so we can run heavier workloads in a smaller footprint. However, Moore’s Law is approaching its limits. The power requirements of servers and storage are steadily increasing, as well as the accompanying heat—driving data center operators and enterprises to explore new cooling strategies to accommodate greater power densities.
Liquid cooling is a re-emerging technology for supporting high-density data centers. While air cooling has been the dominant approach, companies are now exploring liquid cooling thanks to its ability to transfer heat more efficiently than air. In fact, as per Gartner®, “liquid conducts more than 3,000 times as much heat as air and requires less energy to do so, allowing increased data center densities.”[1] We’re using the term “liquid cooling” here rather than “water cooling” since a variety of technical cooling fluids are used, depending on the cooling approach.
Liquid cooling can mean different things to different organizations using one or a combination of cooling technologies. Generally, there are three common approaches that bring liquid closer to the rack that our customers and partners have been testing and evaluating to enable more efficient cooling. They are augmented air cooling, immersion cooling and direct-to-chip liquid cooling. In this blog post, we’ll describe each approach, discuss things to consider before adopting them and look at the changes required from servers and data centers.
Augmented air cooling
Standard air-cooling technologies in the data center already employ some chilled water to function. For example, computer room air handlers (CRAHs) have a chilled water coil inside. The augmented air approach is about bringing that existing technology closer to the rack and therefore closer to the heat source. A rear-door heat exchanger (RDHx) is an increasingly popular way to achieve this. Chilled liquid goes through a coil in the rear door of the rack; that coil captures the heat from the equipment, delivering cool air back to the data center. Technically, a RDHx isn’t true liquid cooling because the chip at the server level is still air cooled, but this approach does bring liquid closer to the rack to harness greater air-cooling effectiveness.
Considerations
Given the reality that we’re seeing greater heat output at the server and chip level, many companies have sought ways to enhance existing air cooling for greater efficiency. And we can gain more efficiency by reducing the heat expelled by equipment into the data center. This enables greater power density in the data center and allows more power-hungry hardware to be packed into a smaller space. For many companies, this is a great first step to deliver efficiency benefits. It involves relatively simple changes in the environment.
What changes are required?
No server-level changes are required when implementing RDHx. Typically, facility water is extended to the rack. When implementing RDHx, it’s important to work with your facility provider to ensure compatibility.
Immersion cooling
Immersion cooling is an emerging technology with many approaches, and it’s exactly what it sounds like: servers are immersed in a large vat of technical cooling fluid. Think of it like a big bathtub. In single-phase immersion cooling, the fluid stays in a liquid state. In two-phase immersion cooling, it changes to gas when it draws heat from the computer chips and then returns to liquid within the cooling loop.
Considerations
While immersion cooling can allow organizations to achieve high power densities within the data center, it also requires the most substantial changes to server technology and data center architecture. Because it’s such a radical departure from traditional methods of deploying IT equipment, immersion cooling can often have substantial upfront costs and considerations, so we highly recommend working closely with your immersion vendor and OEMs if you’re contemplating a deployment. Immersion typically involves very large tubs of liquid that take up about three cabinet spaces and therefore are quite big and heavy. Depending on the approach, it can be more of a challenge and quite messy to remove servers from the immersion container, so this cooling method may not be suitable for all applications, such as those where frequent server moves, adds and changes are required.
What changes are required?
For immersion cooling, both server and data center changes are needed:
- There are several considerations for all the moving parts and components within an immersed server. Compatibility of components, plastics and tapes with the immersion liquid is not guaranteed.
- Immersion in liquid can distort the refraction index of optical fiber, while copper connectivity options remain largely unaffected in the current generation of systems. Signal and power integrity in the copper of next-generation hardware will likely require customized designs for immersion.
- Networking hardware such as switches and routers are often kept in a separate non-immersion environment.
- Some single-phase immersion cooling systems integrate a CDU—essentially a pump that circulates the working fluid and controls the liquid temperature. The CDU connects to the facility water feed, pushing the heat from the tub out into the facility.
- Since servers are removed from immersion tanks vertically, it’s recommended that you implement infrastructure that assists in the insertion and removal of servers from immersion vessels.
- The data center also needs to manage the fluid and maintain its stability, preventing spills, evaporation and precipitation into equipment over time.
Direct-to-chip liquid cooling
For direct-to-chip liquid cooling, a cold plate sits on top of the chip inside the server. The cold plate is enabled with liquid supply and return channels, allowing technical cooling fluid to run through the plate, drawing heat away from the chip. As with immersion cooling, direct-to-chip can be single phase or two-phase, depending on whether or not the cooling fluid changes phase during the heat removal process.
Considerations
Direct-to-chip liquid cooling (DLC) is a unique approach that involves an interior augmentation of the IT equipment with minimal changes to the server exterior. This allows DLC-enabled servers to be installed in a standard IT cabinet just like legacy air-cooled equipment even while being cooled in an innovative way. Though direct-to-chip fits in a standard footprint, it still requires architectural changes and additional equipment to deliver liquid to the cabinet and distribute it to the individual servers—typically more so than with RDHx but less than immersion cooling.
What changes are required?
Direct-to-chip liquid cooling requires some server and data center changes:
- On the server side, a cold plate must be retrofitted in place of the heat-sink with piping that runs through the inside of the server and into ports accessible from the outside.
- A CDU is typically implemented in order to control liquid temperatures and flow pressure to the cold plate.
- In the rack itself, you need a manifold—a liquid distribution unit that distributes cooling fluid to each rack unit to provide liquid to the server.
- You also need additional power strips for the increase in power density. Selecting 415V 3-phase power delivery can ease deployment pains.
- DLC supports high-density racks but benefits from wider (800mm) racks to support the additional components.
Innovating in the data center for high-compute business solutions
There are many reasons why companies might choose one or another option for liquid cooling. Often, the servers they’re using, vendors they work with and needs of their specific workloads drive the decision.
Equinix has been working with our customers and partners on cooling innovations for years. From hotel chains to healthcare companies to financial services providers, many of our customers started with augmented air cooling, saw success with it and are now advancing into direct-to-chip liquid cooling. As more enterprises gravitate to liquid cooling, Equinix is continuing to innovate, co-create solutions and invest in technologies that optimize efficiency in the data center. We’re actively exploring and testing the data center infrastructure that supports this exciting new innovation with liquid cooling technology vendors in our Co-Innovation Facility in Ashburn, Virginia. And we’ve put liquid cooling in action on our own production servers for Equinix Metal®.
More than 10,000 customers rely on Equinix to help them enter new markets, scale their business and optimize operational efficiencies. We’re excited about the opportunities liquid cooling brings to the data center, and we’ll continue to evolve our facilities to support in-demand business solutions like AI/ML and HPC.
Want to learn more about how Equinix is evolving data center design? Download our white paper The Data Center of the Future.
[1] Gartner Glossary, Liquid Cooling.
Disclaimer:
GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.