A Practical Guide to Artificial Intelligence for the Data Center

David Hall
A Practical Guide to Artificial Intelligence for the Data Center

The data center landscape is undergoing radical changes. To compete in this rapidly evolving digital economy, organizations are moving workloads to the cloud, tapping into ecosystems of partners and leveraging artificial intelligence (AI) to deliver faster and better digital experiences. These trends are reshaping the underlying infrastructure. No longer centralized, data centers are rapidly becoming distributed and intelligent to manage these increasingly complex workloads.

It’s an ideal use case for machine learning (ML), an AI approach that enables a machine to learn and self-improve over time without human input. In my last blog, “A Practical Guide to Artificial Intelligence for the Data Center: Part 1,” I covered the basics on how AI works and its similarities to the human brain. Now we’ll explore how artificial neural networks are trained and what that means for data center intelligence.

Training artificial neural networks

Visual recognition is a form of machine learning where researchers teach computers how to understand what an object is by feeding it as many images of the object as they can. This is done through training data, which is data tagged with a known result that the network needs to learn. For example, if we were training a network to recognize cats, we’d get thousands of images of cats in all different shapes and sizes, photos taken from different angles, cropped, flipped, etc. (Figure 1) and mark those images as “contains a cat.” We’d also include many images that we tag as “do not contain a cat.” The artificial neural network (ANN) then takes those images and tries to find out everything that makes them similar, so that it can find the object in other pictures (Figure 2). The ANN is trained by adjusting the weights of the links between the perceptrons. In each run, if the network is getting better at predicting the right outcome, the weights are reinforced. If it got worse, the weights are weakened (Figure 3). Depending on the outcome desired, the training may take thousands of images and hundreds of training runs to complete. More complex models will take longer.

(Figure 1) Photos cropped and flipped to improve the training of the network. Source: Leonardo Araujo dos Santosi

 

(Figure 2) Network is learning to recognize images by distinguishing feature patterns. Source: Towards Data Scienceii

 

(Figure 3) Network is trained by adjusting the weights of the links between peceptrons to improve the target output. Source: Adam Geitgeyiii

 

Once the network is trained, it should now be able to recognize new images with a high degree of accuracy (Figure 4).

 

 

(Figure 4) Source: Quanta Magazineiv

 

Optimizing the data center with AI

So how does this apply to the data center? Optimizing a data center is not easy. The initial design is based on a physics model that contains assumptions about IT workloads, external ambient conditions and other variables. However, it’s impossible to build a physics model that takes into account every scenario and these variables change dynamically over time. It’s the perfect scenario for an AI network because, once trained, it will continue to improve over time, resulting in big efficiency gains. As an example, Google reported saving up to 40% on its data center cooling bill through the use of AI.v

Data centers consume a lot of energy, so training an AI network to improve power usage effectiveness (PUE) is a common goal, in much the same way as we might train a network to recognize images. First the data input layer is connected to various data points captured in the building management system (BMS), such as temperature and pump speed (Figure 5). The goal is to have the network learn how to accurately forecast the PUE of the data center over time. That’s easy to confirm by comparing the predictions from the network to what actually happened (Figure 6). Once the network is predicting PUE accurately, it’s tested with different optimization scenarios such as turning a particular pump on or off. The more variables (features) included in the model, the more accurate it will be.

(Figure 5) Example of a simple artificial neural network for PUE. Source: CloudAvevi

 

 

(Figure 6) Equinix data center charts showing accuracy of artificial neural networks in predicting PUE – while the error rate is less than one percent on both, the more features included in the model, the more accurate it is. Source: Equinix

 

Once enough test runs have been completed, the network is considered trained and capable of dynamically optimizing the data center in real time. Unlike a physics model that is commonly limited to testing PUE at specific design loads, such as 0%, 25%, or 100%, the AI model is capable of making predictions with a much wider range of granular inputs. This is important since the data center rarely operates at design load levels in the real world. Moreover, while a physics model is generally tied to a particular location or design, the AI model is portable to other data centers with minimal retraining on the variables that are different, such as weather or temperature. It can also help solve real problems such a spike in IT workloads when the external ambient conditions are hot and dry. While it may seem more intuitive to operate fewer air handling units (AHUs) under such conditions, the model shows us that isn’t the case (Figure 7). Those predictions can help inform control strategies going forward.

(Figure 7) The AI model for this scenario of high IT load with hot, dry weather shows that operating at 21 air handling units (AHUs) is more efficient than operating at 19 AHU for many runs of the model which correspond to particular operating conditions. Source: Equinix

 

Improving maintenance with AI

AI networks can also help improve equipment maintenance. Scheduling maintenance according to manufacturer guidelines is effective but costly. Many experienced data center engineers will tell stories of how they could tell that a piece of equipment was faulty just by the way it sounded or smelled.vii What if a trained AI network could predict those failures long before they were detectable to engineers?

Equinix is applying these type of deep learning networks today to monitor different types of machines. Each network, called an AI persona, is optimized for a particular type of equipment (Figure 8, 9). As we learn new things about that type of equipment, the AI persona is updated with the knowledge to be leveraged across Platform Equinix, helping to improve the availability of the equipment with fewer, faster maintenance sessions and a better lifetime ROI.

(Figure 8) Example of AI persona monitoring equipment in an Equinix data center. A sensor on a piece of equipment detects that it needs greasing and sends a mobile phone alert to the engineering team. Source: Equinix

 

(Figure 9) This piece of equipment is highly weather sensitive, so the AI persona tells us to grease more frequently than the recommended guidelines of every six months. Using a data-driven approach results in a better lifetime ROI. Source: Equinix

The AHA moment

While it’s easy to see how deep learning networks help us optimize and gain efficiencies over time, it may be harder to picture what the impact is on a day-to-day basis. A few scenarios may help to illustrate:

  1. Learning from failures: Equinix is using machine learning to monitor a computer room air conditioning (CRAC) unit in a North America data center. There is a failure of the device caused by a belt breaking. The onsite team quickly identifies the fault and replaces the belt. They also log the nature of the fault into the machine learning platform. Equinix is then able to automatically re-train all the networks that monitor similar devices by pushing the updated AI persona out, making knowledge of the fault accessible across the business. A week later, the AI network alarms on a similar device in an Equinix data center in Singapore. Since the AI persona has been trained on this fault, the Singapore onsite team is able to take corrective action before a failure occurs.
  2. Working as designed: An Equinix team is testing equipment in the factory before it is to be installed at a new Equinix International Business Exchange™ (IBX®) data center. During the testing, they train the AI persona on what “working as designed” looks like with test data. That knowledge is stored in the Equinix system to support onsite commissioning. With it, Equinix can quickly determine whether the machine is functioning onsite exactly as it did in the factory

Continuously evolving intelligent data centers

In today’s always-on society, the bar keeps getting higher on doing more, faster, better and more efficiently than has ever been done in the past. To stay in the game, organizations are re-architecting how they operate in this decentralized, interconnected economy. Delivering the right thing at the right time depends on real-time interaction between people, locations, clouds, data and things. To achieve this, enterprises are employing multi-vendor strategies, moving workloads to the cloud and tapping into AI.

In this rapidly changing landscape, the data center must also evolve to intelligently handle increasingly complex workloads. Equinix, in collaboration with key industry partners, is augmenting Platform Equinix with this kind of intelligence. Thanks to Equinix’s investment in IBX SmartView™, we are able to flexibly access the data that we need to improve our models (Figure 10). Our AI personas are already fueling data center innovations such as improved power usage efficiencies, higher availability and reduced faults. And, as they continuously learn to recognize new patterns, it will only get better.

(Figure 10) IBX SmartView unifies building management system sensor data from Equinix IBX locations worldwide, and provides direct visibility into the environmental and operating status of your Equinix points of presence. IBX SmartView provides you with immediate alerts if there’s an issue, or you can automatically generate a historical report of trends across multiple Equinix IBX data centers. Source: Equinix

 

Teamwork

No man is an island, and at Equinix we understand that only great teams can serve our customers. I am lucky to work with some of the best data center engineers on the planet, and in particular, my colleagues in the Equinix Data Center of the Future group without whom none of our work in enhancing Platform Equinix with machine learning would be possible.

To learn more, check out our Platform Equinix Vision paper and IBX SmartView™ data sheet.

i Leonardo Araujo dos Santos, Gitbooks, Deep Learning Introduction.

ii Towards Data Science, Why Deep Learning over Traditional Machine Learning?, Mar 2018.

iii Adam Geitgey, Machine Learning is Fun Part 8: How to Intentionally Trick Neural Networks, Aug 2017.

iv Quanta Magazine, New AI Strategy Mimics How Brains Learn to Smell, Sept 2018.

v DeepMind, DeepMind AI Reduces Google Data Centre Cooling Bill by 40%, July 2016.

vi CloudAve, Optimizing Data Centers Through Machine Learning, June 2014.

vii Schneider Electric Blog, Top 2 Ways AI Will Change Your Data Center Forever, Sept 2018.

 

Avatar photo
David Hall Former Fellow focused on Technology and Architecture in the Office of the CTO
Subscribe to the Equinix Blog