3 Key Components of High Performing Data Center Operations

What it really takes to achieve continuous maintenance and deliver five nines of uptime

John Stratton
Chris Castle
3 Key Components of High Performing Data Center Operations

If we had to choose the single most important factor to explain what makes the delivery of data center capacity and uptime successful, we would choose: continuous and strategic maintenance. Data center facilities must operate within a very stringent set of parameters—24 hours a day, 7 days a week. An operator’s approach to strategic maintenance is really the key to ensure the highest quality and continuous delivery for clients.

Maintenance encompasses a complex interaction between people, process and equipment; only one of which is easy to quantify–the equipment. Typically, the listing of a data center’s equipment is published on the website, or in a data sheet, as they are simple to illustrate. However, the tactical importance of this information may be easily misunderstood without a sound understanding of how the interplay between the equipment, the people responsible for running and maintaining it, and the processes they follow, leads to the true success of data center operations.

As you are considering whether to place your trust and critical business infrastructure into a colocation facility, the issue of maintenance is crucial, but difficult to assess. But with a bit of guidance to help your evaluation, the gain in understanding operating practices will pay off several times over the course of your operations lifetime.

Of course, the building operations systems and equipment matter. No matter how good modern industrial systems get–and they are very good these days–facilities like data centers have a stringent set of operating parameters they must achieve, and by no means do they operate themselves.

Gartner® Rates Equinix Highest Scoring Colocation Provider

Discover why Gartner rated Equinix as the highest scoring colocation provider and how Digital Leaders can benefit from the Equinix investment in a robust, interconnected global ecosystem.

View analyst report
Screen Shot 2022-04-27 at 7.45.17 PM

People, process and equipment are interdependent

There’s a complex system of overlapping dependencies that are involved in operating an efficient, effective data center. People need training to follow process and manage equipment and systems. Process must be aligned to run equipment and systems seamlessly and adhered to by fully trained people. Equipment and systems need to be run by knowledgeable workers who follow the right process. This interdependence is crucial for high performing data centers.

People need aligned processes and equipment expertise to be effective

Understanding a data center operator’s approach to getting the best from their people is the real key to getting a sense of everything else that goes on within the data center. Engaged and motivated employees, with defined responsibilities, make a tangible difference by delivering better results which can be measured in the resilience and performance of the site itself.

You’ll want to assess the depth of the operator’s maintenance strategy and their corresponding approach to training programs. Data center operators not only need to foster and care for the employees’ well-being; they also need to ensure that the staff is up to speed on the latest approaches and best practices for maintenance at their site. Ask: what aspects of maintenance do they manage inhouse vs. outsourcing?

At Equinix, our employees leverage a comprehensive in-house learning system for training and refresher classes in conjunction with live subject matter expert training and collaboration sessions and site-level drills for various scenarios.

We believe that by educating site engineers on how to perform their own maintenance activities, our teams will be more engaged and invested in delivering a high-quality outcome. On-site employees have a front-row seat to the interplay between facility systems and therefore the ability to identify, anticipate and intervene when symptoms arise. Our vendors perform complex and critical maintenance tasks but work in consultation with our in-house staff. This leads to a greater understanding of the facility, while building knowledge and expertise of the systems.

Processes are tailored to equipment, delivered by qualified people

Determining the effectiveness and adherence to the process is the most difficult aspect of maintenance to assess. Everyone has a process–is it any good?

You’ll want to assess the depth of the operator’s maintenance strategy and their corresponding approach to training programs."

Just like with your car, there is a prescribed maintenance schedule for data center equipment. As consumers, we tend to rely on a manufacturer’s advice (change your oil every 5,000 miles) and outsource maintenance to a mechanic. But for businesses operating fleets of vehicles, maintenance plans and schedules are an actively managed, critical part of complex operations. It’s no different with data centers.

Taking a rigorous approach to enforcing the policies and procedures is a must-do for any data center operator. But just as critical is the practice of regularly reviewing and improving the processes as systems, tools, technology and business needs change.

Ask if there is a regular review process to ensure that new requirements, business pressures and people factors are continuously brought into alignment. This demonstrates a greater investment by the company, making their people part of the solution and not just followers of the process. Taking this approach provides technicians the opportunity to influence process improvements, while maintaining the requirement to adhere to the process.

Equipment efficiency depends on qualified people and aligned processes

There are basic components and equipment that make up a data center: everyone has UPS, chillers, etc. and at some level the components are all very similar. Having multiple copies, or the ‘best’ brand equipment isn’t enough to guarantee resilience. Inhouse expertise is critical to understanding how the maintenance process, scheduling and equipment interact with the full set of systems within the facility. If you are trying to understand if the facility is good or bad at maintenance, assess whether the operators are considering the needs of the facility as a whole, and if they have an adjusted maintenance schedule that is optimized and scheduled accordingly.

Beyond the three key components of data center operations

Let’s take a closer look at two additional equipment-related topics that have a significant impact on data center resilience and may or may not be mentioned in the data center datasheet.

Cybersecurity beyond network traffic

Even though most colocation operators don’t interact with client network traffic, cybersecurity remains a critical consideration for each function in the data center.

A robust building management system (BMS) is crucial for a performant environment. It’s important to have a system that is designed to integrate with a variety of industrial control systems and designed for high data transfer rates. These “industrial strength” systems allow for redundancy where it makes sense yet have the most flexibility for optimizing the performance of the data center. But these systems are not completely isolated from the rest of the world and care must be taken to ensure proper security protocols are being followed.

Test systems add value to maintenance

An easily overlooked, but powerful piece of maintenance equipment is a load bank, which is used to simulate an electrical/thermal load. Not every facility has one and not every facility needs one; but where they are present, they might just be the unsung hero of a facility. Load banks enable independent, full-scale live testing of building systems; otherwise, maintenance and testing must be accomplished through other means.

Another advantage is that a load bank allows equipment testing without using the customer’s load as a test bed. This means you can test more frequently on more systems and reduce the period of risk. Also, the load bank can be used to validate the operation of equipment that just went through maintenance or repairs before you put it back the into production environment service rotation.

Just like with your car, there is a prescribed maintenance schedule for data center equipment."

Assessing the specific strengths of data center operators

There is not a single way to achieve high performance in a data center. Be open to how each company meets their success. If they have settled on a design that does or does not compare favorably on paper, there must be a reason behind it. Understanding the reasoning and how the design intends to meet the maintenance needs of varying topology, will ultimately lead you to important deployment design and management decisions that will affect your uptime. These factors are complicated, and everything must work seamlessly together.

Since 2017, Equinix has acquired more than 65 data center facilities—each with various levels of operational quality. Equinix actively works to bring them up to operational standards. In fact, we continue to deliver on our track record of ‘five nines’ performance as we grow. This goes to show that together, people, process and equipment with a dedicated focus on continuous and strategic maintenance of the facility yields resilient uptime.

To learn more about how Equinix runs its colocation facilities, read the Gartner Solution Card for Colocation Provider Equinix.

 

Together, people, process and equipment with a dedicated focus on continuous and strategic maintenance of the facility yields resilient uptime."
John Stratton
John Stratton Senior Director, Global Operations Engineering
Chris Castle
Chris Castle Director, Product Marketing