Data Gravity vs. Data Velocity

How the requirement to exchange data with more partners at higher velocity has become the driving force behind hybrid infrastructure placement

Jed Bleess
Data Gravity vs. Data Velocity

As 2024 opens, it’s clear that organizations everywhere are in a race to harness the power of AI. Companies have accumulated massive quantities of data and are now using powerful analytics tools to mine it for business insights and generate value. But working with—and moving—all that data can get complicated and costly, and this is leading many organizations to re-evaluate their core infrastructure design principles.

I believe the most strategic approach to infrastructure placement should consider several factors pertaining to your data: where data is generated; how it is staged, processed and aggregated; how to distribute where the data is stored; and where it can interact with more partners to deliver the greatest value. The latter is becoming especially important today, as digital ecosystems become more foundational to business success. And thus, the movement of data has become more important—how far and fast it travels can have an impact on application performance and user experience.

Concepts like data gravity and data velocity arose to help enterprises think about the characteristics and patterns of data, as well as how they shape decisions about where and how to design your core infrastructure. But how you apply these principles to scale your business, gain market share and take advantage of emerging technologies as business enablers is what really matters.

Data gravity = Data drawing infrastructure to itself

When I was growing up, my grandmother often drilled key axioms into me. Her favorites were “Nature abhors a vacuum,” “Water seeks its own level” and “Compute follows the storage.” OK, that last one was actually “The cart follows the horse,” but it expresses essentially the same idea—that there’s a proper sequence to things. The idea that compute follows storage has been a foundational principle for me for more than two decades based on my work with open systems and storage area networks. But a lot of the old rules have changed, and what was once an axiom to help optimize performance has turned into a limitation. Many organizations I talk to have established public cloud-based data sets in AWS, Google Cloud Platform or Microsoft Azure that are so large they can no longer effectively move the data out of the cloud. This new pressure has evolved into a larger idea that data draws infrastructure to it—which is the principle of data gravity.

Data gravity is usually defined as the tendency of large data sets to pull in smaller applications, services or bodies of data to reduce latency and make the best use of existing bandwidth. In other words, large pools of data have a gravitational pull on other aspects of IT infrastructure, as well as services and applications, and thus infrastructure is drawn to the largest pools of data. This often leads organizations to think they must bring their infrastructure to where their data is resting or being stored.

The challenge is that when we start to look at what’s really driving hybrid infrastructure placement today, there’s more to it than just where a company’s largest data sets reside. For example, where is your data interacting? Who are you exchanging it with? How can you increase the value of that data by exchanging it with more partners? These questions are quite pertinent for companies exploring the use of AI, as AI often involves sharing data and bringing in external data, and AI ecosystems are developing where technology providers and enterprises interact to maximize value.

Data velocity = How fast data is created and moves

Thinking about where and how data is exchanged brings us to the concept of data velocity, which refers to the speed at which data is generated, processed and exchanged, as well as how quickly data moves.

While the meaning of data velocity is self-evident, it has become profoundly important. As organizations continually adapt their digital infrastructure and expand to new markets, their data pools have become increasingly distributed. Thus, it has become more important to shift the focus to the locations with the greatest amount of data exchange—that is, the highest data velocity—which means bringing the applications to the data. For many organizations, this design shift starts with their cloud architecture because cloud access is driving the greatest velocity of data exchange.

For many years, the growth in data exchange has been driving the digital economy, and data has the potential to be an organization’s most valuable asset. The critical question is, can that data be processed, shared and exchanged with the partners and value chain at the velocity needed to realize that potential? With companies in a race to harness the power of all forms of AI, they’re now rethinking their infrastructure design to optimize data processing and exchange.

How digital leaders approach infrastructure placement

Today’s hybrid infrastructure model is intended to support the requirements for more distributed access, expanded public and private infrastructure, increased partner interaction and greater security, control and custody of data. When we start to look at how to optimize that infrastructure, the macro view of data gravity can mask the forces that are really driving infrastructure placement.

The Global Interconnection Index (GXI), a market study published by Equinix, shows that the fastest-growing leaders are breaking the pattern of consolidating infrastructure into siloed locations and instead focusing on the locations with the largest volume of data exchange with the largest number of partners. Digital leaders have moved away from the model of designing around where data is stored and instead are focusing on where they have the highest data velocity.

When I have the opportunity to work with customers in design review, this shift in focus often has the single greatest impact on scaling their business cost-effectively. The amount of data exchanged across clouds, supply chain partners and commercial ecosystems has grown faster than most organizations’ infrastructure foundation was designed for. This can lead to high data transport costs and has made it difficult for companies to adapt to a changing market. As market disruption opens up new opportunities, they may need to be able to bring in more partners and exchange data quickly, without driving up transport costs that hold back their business.

Optimizing data velocity to create more value

If you were only focused on data gravity, you might put your largest data sets and your core infrastructure somewhere remote to save on costs. However, if you factor in data velocity and the increasing need to connect to more partners, you may want to be in the locations where the highest number of your essential partners, service providers and clouds are.

Just a few years ago, the idea of distributing infrastructure to optimize data velocity would have been possible for just a few. But the leaders have established a set of patterns that other organizations can now follow:

  1. Establish technology hubs in the locations with the largest volume of data exchange.
  2. Directly connect to the highest-value clouds, marketplaces and ecosystem partners.
  3. Use both private and public infrastructure to optimize key workflows and achieve predictable cost models.
  4. Distribute edge infrastructure in additional locations close to your supply chain, ecosystem partners and data generated at the edge.
  5. Extend hubs to top population centers to bring you closer to your customers.

By following this approach, organizations can build a foundation for a scalable digital business. It enables access to multiple clouds, on-demand bare metal, cloud adjacent storage, increased partner data exchange and lower transport costs.

Design your infrastructure for competitive advantage

To really capitalize on AI, we’re already seeing leaders adopt these principles to optimize their training and inference workflows by thinking carefully about data velocity. Data gravity is real, but the choices you make about where to put core infrastructure involve much more than simply where your biggest data sets live.

That’s why Equinix has 250+ data centers around the world—so that you can place infrastructure wherever you need it and connect to all the partners and services you need. There are 2,000+ network service providers and 3,000+ cloud and IT service providers on Platform Equinix®—not to mention 5,000+ other enterprises. The robust digital ecosystems available on our platform are delivering enormous value for organizations across many industries: gaming, air travel and healthcare, to name just a few.

Read more about the value of connected ecosystems in the IDC white paper Connected Ecosystems, Distributed Infrastructure for Digital-First Business.

Subscribe to the Equinix Blog