Humans have been dreaming about the possibilities of automation for centuries. Today, we’re fortunate to live during an era when automation technology is finally starting to catch up to some of our wildest dreams.
From an enterprise perspective, no digital transformation strategy is complete unless it includes a comprehensive automation strategy. Enterprises can apply automation to cut costs, reduce waste and inefficiency, increase performance and reliability, free up employees to focus on high-value work, and much more. The potential benefits of automation are essentially endless; this means it’s important for enterprises to strategize exactly what they hope to accomplish, in addition to how they plan to accomplish it.
Enable AI at Scale with NVIDIA and Equinix
In this white paper, ESG looks at how Equinix and Nvidia enable AI at scale by leveraging digital well-connected infrastructure and state of the art systems and software for AI workload life cycle.
DOWNLOAD ANALYST REPORTAutomation enables a better approach to digital infrastructure
Cloud companies have adopted automation in order to scale their operations. Clouds provide infrastructure consumption as a service, and the implementation of each of these services has been automated with respect to provisioning, monitoring and problem resolution. The “as a service” model of the cloud has been hugely successful and is one of its key value propositions.
Increasingly, non-cloud infrastructure vendors are also providing their hardware and software as a service. They too have adopted the automation mantra and have simplified the consumption of their services. Thus, in some ways, they have caught up with the clouds with respect to automation.
The following are some concrete ways that organizations are leveraging automation to cut costs, reduce errors and avoid tasks that are monotonous for human beings:
- Automating physical fiber interconnection in data centers using robotics. This removes the potential for manual error and increases agility and flexibility by streamlining changes to customers’ interconnected ecosystems.
- Automating physical security and safety measures within data centers. Using AI for video feed analysis, data center operators can automatically detect and respond to anomalies like unauthorized penetration of the security perimeter or potential hazards like water or cardboard boxes on the data center floor.
- Intent-based networking, which represents the logical next step in the evolution of software-defined networking. Network administrators can define the desired outcome or business objective they hope to meet. The network then applies AI and machine learning capabilities to determine how best to achieve that intent, and adapts itself accordingly.
- Automating bare metal provisioning and management. One example of this is Tinkerbell, an open-source project built and maintained by the Equinix Metal® team. Tinkerbell uses microservices and an API-based approach to make it quicker and easier for businesses to stand up bare metal infrastructure when and where they need it.
- Automating compute container movement between data centers. Enterprises can automatically move their compute instances based on energy availability. This can help decrease their overall power costs, while also improving environmental sustainability by emphasizing renewable energy use.
- Automating cybersecurity threat detection. Human cybersecurity experts simply can’t scale their capabilities wide enough to perform manual threat detection. Automated systems are able to parse massive volumes of data to identify anomalies before they become an issue.
Understanding the basics of automation
To gain a better grasp of how automation can support digital infrastructure optimization, it may be helpful to take a step back and consider what automation is at its most basic level. In its seminal white paper on the topic, IBM defined the core components of automation using the MAPE loop: Monitor, Analyze, Plan and Execute.[1]
To truly be considered automated, a system must be able to perform each of these functions in sequence:
- Monitor: Collecting and aggregating data from a particular managed resource.
- Analyze: Using the data collected to model and predict potential future scenarios. Increasingly, enterprises are using machine learning and deep learning techniques in this phase to create models that can detect problems and also predict the future.
- Plan: Applying the insights gained in the Analyze stage to determine how to achieve specific objectives.
- Execute: Putting the plan into action.
The MAPE loop is a “loop” in the sense that it’s not a one-time process. Automated systems will consistently cycle through each of the four functions, allowing them to adjust as new data points enter the system. The objectives that enterprises hope to achieve through the MAPE loop will take the form of service-level objectives (SLOs) or service-level agreements (SLAs).
Furthermore, one usually creates a hierarchy of MAPE loops, where there are smaller MAPE loops corresponding to the smaller components of the system. To use a car analogy, there can be separate MAPE loops for the transmission system, air conditioning system, braking system, etc. These loops all interact with each other to satisfy SLOs corresponding to speed, temperature, safety, etc. In many deployments, it is prudent to initially have a human involved in the MAPE loop to certify that the machine-generated automation scripts make sense. After humans gain more trust in their models, they can then fully automate the MAPE loop.
In addition to describing the desired system objectives, SLAs also describe the penalty function if the SLOs are not satisfied. System architects will set different SLOs or SLAs for every aspect of their digital infrastructure, including compute, storage and networking. Certain objectives, such as performance and availability, will be common across all digital infrastructure. Others will be unique to one particular area, such as durability for storage and jitter for networking.
Why AI and black-box models represent the future of automation via digital twins
AI and machine learning sit at the heart of automation. Understanding the different machine learning models and how they’ve changed over time can be helpful in understanding why automation works the way it does.
Essentially, machine learning models are classified using two different criteria: how accurate they are and how easy they are for human observers to understand and interpret. Based on these criteria, we can separate models into white-box models and black-box models:
- White-box models rely on rules that are manually specified by experts. As such, it’s fairly simple to look at a white-box model and understand why it returns the predictive results that it does. However, white-box models tend to be less accurate, for the simple reason that experts aren’t always able to update rules quickly enough to keep up with the features being added in the system that is being modeled.
- Black-box models include more modern, sophisticated machine learning models, such as neural networks. They are called “black boxes” because they deduce rules or patterns without human intervention, which makes it more difficult for observers to understand how and why they make the predictions they do. Black-box models tend to be more accurate, because they are able to update themselves in real time to keep up with changes in the observed environment and the system being modeled via retraining the model.
One area where black-box models are helping drive better results in digital infrastructure is through the use of digital twins. Organizations are creating digital twins to represent cars, airplanes, factories, shopping stores, data centers, storage systems and more. Black-box models are ideal for digital twin initiatives because they remove the need for human intervention to support change management. This helps ensure the models are as accurate and resilient as possible, even as the circumstances surrounding the observed environment change over time.
By creating an exact digital representation of their infrastructure assets, enterprises can perform predictive analytics to accurately forecast how their infrastructure would perform under specific scenarios. These analytics insights can be applied to optimize for a number of different objectives, including performance, resilience and energy-efficiency. Based on the predictive insights, automation models can intervene and make corrections anytime a system is in danger of not meeting its SLAs. Enterprises deploy digital twins close to where the data is generated (e.g., on factory floors and in shopping stores) and they pose queries on these models using real-time data.
Thanks to the growing availability of black-box machine learning models, we now find ourselves in an exciting new era of automation. Increasingly, users are able to specify their objectives, and then let the system figure out how best to meet those objectives, with no manual intervention required. This “intent-based” approach provides almost limitless potential to redefine what enterprises can accomplish with their digital infrastructure, and Equinix is proud to help our customers make the most of what automation has to offer.
How Equinix is helping customers build their automation solutions
Digital infrastructure both enables and is enabled by automation. Many of the automation examples we’ve covered in this blog require significant amounts of data to power machine learning models, which in turn means enterprises need to apply significant compute resources to process all that data. Moving large volumes of data over long distances can lead to latency and high costs, particularly if the enterprise is moving data from the edge location (where the data is generated) to a central remote core location.
Hybrid edge infrastructure for automation
Using a hybrid methodology based on cloud-adjacent infrastructure enables a better approach to automation. Instead of moving data to the compute resources in a central cloud, enterprises can move compute to where the data is generated. That is, they can create “edge clouds” that are in every metro. Equinix data centers are ideal for hosting edge clouds because they are <10 ms RTT from the edge devices that are generating the data, and within 1-2 ms from the back-end central IaaS clouds. These edge clouds are ideal for doing both AI model training and AI model inferencing operations.
With more than 240 Equinix IBX® metro data centers spread across six continents, it’s easy for our customers to deploy the infrastructure they need to support their automation initiatives, right where they need it. In addition, many of those data centers are home to cloud on-ramps to top providers. Organizations can keep their data in a neutral Equinix data center while leveraging innovative AI services from multiple clouds, and thus not get locked-in to a single cloud provider.
Data marketplaces
When pursuing automation use cases such as digital twins, enterprises may not be able to find all the data they need to create an AI model internally. As a result, they’ll need to tap into data marketplaces to acquire additional data and prebuilt AI models from external entities quickly, securely and transparently.
Data marketplace operators are standing up data marketplaces at Equinix for different industry verticals because Equinix provides interconnection hubs at 70+ metros globally, distributed edge infrastructure and vast partner ecosystems needed to build effective data marketplaces. We have a track record of enabling on-demand data sharing platforms across a variety of industry ecosystems, including for the media and entertainment, telecom, manufacturing, healthcare, retail, autonomous vehicles, and financial sectors.
Hardware provisioning
As discussed earlier, the Equinix Metal team has made its work around Tinkerbell available to the open-source community. We hope that this move will help make automated bare metal hardware deployment more prevalent across the data center industry, similar to how the Kubernetes community helped make automation more prevalent for application deployment.
In addition, our support for automation frameworks including Terraform, Ansible, Pulumi and Crossplane gives our customers flexibility with respect to adopting different automation technologies for their infrastructure deployment. Using Infrastructure as Code, enterprises can store proven infrastructure configurations. Those configurations can then be used to support automation practices, such as quickly redeploying infrastructure in the aftermath of an outage.
Resilient networks
In order to meet their SLAs, customers need their AI models that are used for automation to be available when and where they need them. This is why Equinix offers network fault recovery automation which helps reroute customer traffic in the aftermath of a network outage. This ensures our network infrastructure is resilient enough to support the automation workloads our customers have come to rely on.
Put Equinix automation leadership to work for you
At Equinix, we understand that automation models are only as good as the data you feed into them. Increasingly, customers want distributed AI infrastructure where AI model training and inferencing happen at the edge. We have experience helping our customers deploy the infrastructure they need to develop and deploy automation in a manner that maximizes performance and scalability, minimizes latency and optimizes cost-efficiency.
To see one example of how distributed, interconnected infrastructure from Equinix can be paired with software solutions from our partner ecosystem to meet the requirements of modern AI and machine learning models, read the ESG white paper “Enable AI at Scale with NVIDIA and Equinix.”
[1] IBM, “An architectural blueprint for autonomic computing.” June 2005.