Data Management Best Practices for a New Era of Digital Twins

Digital twins are becoming mainstream for enterprises, so follow these best practices for digital twin creation and data management

Kaladhar Voruganti
Data Management Best Practices for a New Era of Digital Twins

Digital twins are nothing new. While the term was coined by NASA in 2010, we were creating models of physical artifacts long before that. In the last decade or so, technological advancements in big data, high-speed networks and supercomputing power have caused digital twins to pick up significant momentum. Today, they’re being more widely adopted as businesses create models of factories, hospitals, airplanes, cars, data centers and even human customers or patients. We’ve entered the era of digital twins for both enterprises and consumer AR/VR applications.

Companies are always looking to optimize their operations, better serve customers and predict trends in order to detect potential threats or opportunities. Digital twins are a powerful tool for statistical modeling and prediction, and now, digital twin marketplaces are being formed where organizations can share their data and digital models across an ecosystem to further accelerate innovation. These marketplaces are ushering in a new era for digital twins—fostering greater collaboration and opportunities to co-innovate.

Evolving ecosystems for competitive advantage

Learn why leaders must find simple solutions in a complex world and interconnect with the rest of the world.

Download eBook
manhattan office building from below

What is a digital twin, and why do we need them?

A digital twin is a computer model that represents a corresponding physical artifact. Digital twins are being used in many industries now—from manufacturing to healthcare, factories, data centers and more. Some of the emerging applications include digital models of people in the metaverse, teaching assistants for education and individual customers in the retail sector. Today’s models are being created using AI and ML techniques, enabling businesses to predict the behavior of physical artifacts.

Digital twins allow companies to explore what-if scenarios, such as:

  • Can I fly this plane for another 10 hours?
  • How will the heat profile of a data center look if we deploy ten more 20KW racks?
  • What will happen if I put product X and product Y next to each other on the store shelf?

This helps companies optimize operations and make more accurate predictions. Digital models can even be used to address sustainability goals by measuring and predicting energy consumption. And they’re helping data scientists and subject matter experts to converse more easily with each other. Essentially, digital twins are taking AI mainstream.

What does a digital twin workflow look like?

There are four key steps in digital twin creation and usage, as shown in figure 1:

Figure 1: Digital twin workflow

Data ingestion: In the data ingestion phase, data is collected from the physical artifacts you want to represent with a digital twin. The amount of data varies based on what the physical object is and how detailed a model you need—but the raw data used to create a digital twin can be in the order of multiple terabytes. You need to collect data from enough samples to get an accurate representation. For example, to create a digital twin of a particular airplane, you need data from multiple physical planes of that particular make and model. As shown in figure 2, there are many data ingestion tools and frameworks. Furthermore, various networks such as wired, low-power, 5G and Wi-Fi are used to transfer data from sensors embedded in various physical entities.

Data cleansing and aggregation: Next, you need to “clean” the data collected from physical artifacts to remove noise and dirty data. Often, this stage also involves aggregating data from multiple external sources—such as data brokers, public clouds and private data centers.

Digital twin creation: When it’s time to create the digital twin, typically a team of content creators, who might be spread across the globe, collaborate. The digital twin creation process can be compute-intensive, involving complex AI and ML models. The latest generation of GPU-based model training hardware can consume >30 kVA per rack, which requires support for liquid cooling. In most cases, this type of infrastructure cannot be hosted in private data centers.

Figure 2: Hybrid digital twin creation architecture

Digital twin usage: As noted above, digital twins can be used to explore what-if scenarios. There are different types of digital twins:

  • A static representation of a physical object
  • A static representation of a physical object whose state gets updated in real time on a dashboard
  • A static representation of a physical object that’s capable of predicting a future state
  • A dynamic representation of a physical object whose state gets updated in real time and is capable of predictions and answering queries in real time

Increasingly, digital twins are hosted close to where the data required to keep them current is being generated. As shown in figure 3, there are multiple edge locations where the digital twin can be hosted, and architects need to consider tradeoffs with respect to model accuracy and infrastructure hosting cost.

Figure 3: Edge hierarchy for hosting digital twins

What are the data management best practices for digital twins?

1. If digital twin data is generated outside the cloud, store and process it outside the cloud.

Many organizations are realizing that if the raw data used to create a digital twin is being generated at the edge, it doesn’t make sense from a cost, privacy and performance standpoint to move that data into a remote, central public cloud. Charges can pile up for backhauling traffic to a remote cloud. Furthermore, in the public clouds, you can incur extra costs for storing data, such as a “data access cost” for every data access operation and data egress costs for moving data out of the cloud. In many use cases, these variable costs can be a substantial part of the overall cloud storage costs. And often, it’s difficult to get access to the latest GPU compute instances in the public cloud due to high demand. Thus, many customers are placing both their compute and storage outside a central public cloud for creating digital twins.

However, if you want to keep your investments in your existing cloud services, you could adopt a hybrid architecture where you access AI and ML services in the public cloud but keep data (at lower cost points) at a cloud-adjacent location, as shown in figure 2. This approach allows you to use innovative cloud services from multiple clouds. Equinix International Business Exchange™ (IBX®) data centers are within 1–2 milliseconds (ms) from most public clouds, in 70+ markets, and thus are the preferred cloud-adjacent location for hosting large data sets that have been generated in a particular market.

2. Choose the right edge to host your digital twin.

Once the digital twin has been designed and created, it needs to be hosted at the right edge location to feed it with appropriate real-time data streams to keep its state current. For example, when doing predictive maintenance on airplanes and automobiles, you need to do the appropriate digital twin processing both in the AR/VR goggles as well as at an edge location that has <20 ms round trip time (RTT) network latency. For AR/VR digital twin use cases that are sensitive to motion and real-time changes in the surroundings, you need to do appropriate processing both on the goggles as well as at a server location that is <5 ms RTT latency. However, for use cases that can tolerate higher latencies, it makes more sense to host the digital twin at a higher node in the edge hierarchy because this can help to amortize both CAPEX and OPEX costs across multiple active digital twins in multiple edge locations. In most markets, Equinix IBX data centers are within 10 ms RTT from the end devices and thus are a cost-optimized place to host digital twins for many use cases.

3. Choose a digital twin marketplace that’s integrated with federated AI.

Increasingly, people aren’t building digital twins from scratch. That is, they either customize or enhance a previously created digital twin or create a composite digital twin that aggregates smaller digital twins. Thus, we’re now entering the world of digital twin marketplaces where companies can buy or sell digital models. For example, the manufacturer of a car air conditioning system might offer its digital twin to a variety of vehicle manufacturers. These marketplaces tap into the power of ecosystems and broaden the potential of digital twins. In the IDC FutureScape 2022 Predictions, IDC predicted that “By 2025, 80% of industry ecosystem participants will leverage their own product, asset, and process digital twins to share data and insight with other participants.”[1]

Interconnected digital ecosystems are quickly becoming a strategic priority for businesses. According to the Equinix 2022 Global Tech Trends Survey, 76% of digital leaders indicated that connecting with new digital ecosystems is a top priority in their technology strategy. These thriving, dynamic ecosystems present new opportunities for digital twin development.

Figure 4: First- and second-generation digital twin exchanges/marketplaces

First-generation digital twin marketplaces provide a catalog where sellers can register their digital twins and buyers can procure digital twins. Some of these marketplaces also allow data providers to move data to the location where the marketplace is hosted (i.e., a public cloud), and a consumer can use the AI resources available at the public cloud to create digital twins using the raw data from the providers.

Second-generation digital twin marketplaces are integrated with federated AI orchestrators that allow creators of digital twins to procure data from multiple data providers. In many cases, these data providers don’t want their raw data to leave the confines of their security perimeter. Thus, second-generation marketplaces provide federated AI orchestrators where instead of moving the data to centralized compute location (e.g., a public cloud), you move the compute to where the data is located (private customer data centers or cages in a colocation facility) to build the digital twin model. Raw data never leaves the physical proximity of the data provider, but only the local model that has been built using the data at that particular site gets taken out. Homomorphic encryption or differential privacy techniques are used to safeguard the local models. These local models are aggregated at a central location to create a better global model. Thus, second-generation exchanges allow digital twin creators to get access to confidential data that historically wasn’t easily accessible to them.

How Equinix can help

These are just a few of the latest changes in the digital twin space, and I’m excited to see what’s ahead. Equinix and its partners can provide AI infrastructure to help you create digital twin models and execute digital twin inferencing closer to the edge, where the data is being generated, with solutions like Equinix Metal® bare metal services. Equinix Fabric® helps with connecting compute-intensive infrastructure locations (e.g., public clouds), where digital twin models are created, to edge metro locations, where digital twin models get used. In the era of distributed collaboration, where development teams can reside globally across distributed sites, Equinix Fabric helps to move massive data sets across these sites in a secure, high-speed manner. Platform Equinix® provides customers with compute, storage and a connected network Infrastructure as a Service (IaaS) across globally distributed locations for creating and using digital twins.

To learn more about evolving your infrastructure for cutting-edge technologies like digital twins, download our e-book, Evolving ecosystems for competitive advantage.

 

 

[1] Jeffrey Hojlo et al., IDC FutureScape: Worldwide Future of Industry Ecosystems 2022 Predictions, IDC, October 2021, Doc # US47771821.

Subscribe to the Equinix Blog