If you turn the clock back ten years or more, you’ll find that the most common use of big network pipes back then was for data replication, usually for disaster recovery. Businesses used those pipes for near-constant replication of terabytes of data from a company’s primary data center to a secondary. These were the days when ALL of your applications ran in one place and served your users and customers from there.
Today applications run everywhere. Digital transformers are heavily leveraging hybrid multicloud environments and digital leaders are taking greater advantage of the edge, which brings with it the benefits of single tenancy, cloud economics and local latencies. Since applications big and small either acquire or require data, there’s an obvious need to create data motion services that will feed these applications, as well as provide for their remote recoverability and governability on an enterprise-wide scale. The characteristics of the specific location of an application can provide challenges for proper data management, and box an organization into a corner they may find it hard to extricate from in the future.
While there are many storage and software vendors that move data, we see a few general data motion patterns that organizations use to ensure not only that their applications get the low-latency access needed to ensure high performance, but also optimize their ability to govern and maintain custody of their data while avoiding the costs associated with cloud repatriation. These patterns enable business agility, accelerate go-to-market and maintain vendor leverage.
Hybrid big data replication with enterprise-grade connectivity
See how WANdisco’s sophisticated data migration and replication capabilities on Equinix’s powerful digital infrastructure enables enterprises to replicate their data efficiently, seamlessly and cost-effectively.
DownloadThe following examples are some of the most popular data motion patterns:
- Project and Delete: This motion involves a large dataset located in customer-owned/leased equipment in the core and there exists a desire to leverage public-cloud based PaaS offerings to do work on that data. Housing the primary copy of that data in the public cloud creates a barrier to exit given potentially high data egress costs. By housing the primary copy of the large dataset in the core, on owned storage, an ongoing replication into the public cloud (“projection”) occurs, giving the PaaS a low-latency, high-throughput data source upon which to create its value. Should the desire to move the data or discontinue the service arise, the primary copy of the data already exists in the core, allowing for the simple deletion of the copy in the public cloud provider, avoiding egress costs. This also allows for projection of some or all of these datasets to multiple cloud providers, for a best-in-breed approach.
- Cache Access: Some solutions act as a storage target that will fetch pieces of data upon access from a local application or user. This pattern works well for data that is accessed multiple times, or is temporal in nature, as the cache target software can “fetch” data from the core based on criteria, making it available for use locally in the public cloud provider or at the edge. Should data be requested that isn’t present in the cache, it is fetched in real time, which is obviously a slower operation. If the caching solution can be optimized for a given application, this data motion has the possibility of saving on ongoing cloud storage costs. Instead of replicating entire large datasets, the solution tries to predict what is required and when, deleting copies of data it no longer thinks it needs.
- Remote Access in Place: One of the classic data motions that cloud adjacency created was the ability to run applications in the public cloud, but house the data outside the cloud, in a location that allows for the lowest latency, highest throughput and lowest cost possible. For applications that don’t require extreme performance, this pattern is both simple to implement and deploy. The application uses the data on the storage in the proximate location. For those applications that need high throughput or extremely low-latency access to datasets, this pattern can be challenging as there is no way for an architect to know for certain how much latency exists between the cloud ingress and the application compute. That physical infrastructure is hidden from view. Further, there is little control or visibility of network throughput once data enters the shared-tenant public cloud network. These constraints create the necessity for the data motions above.
- Replication to Aggregate: As data is created at the edge and in the cloud, these multiple disparate data streams need to be moved into data lakes for AI training/analytics or simply to be archived for later extraction or compliance events. Maintaining this data in multiple locations creates many problems around governance, security and recovery. And aggregating this data provides possibilities to find patterns and value across datasets that would be difficult to find kept apart. Simple 1:1 replication from source to core (where it can be projected to or accessed from the cloud or used in place) is a common pattern being used today.
WANdisco – hybrid big data replication with enterprise-grade connectivity
WANdisco has joined forces with Equinix to offer a hybrid multicloud solution for data replication that keeps geographically dispersed data at any scale consistent between on‑premises and cloud environments. Given that many of the data motion patterns described above cross hybrid multicloud environments in different locations, they are covered by the combined Equinix and WANdisco solution. The WANdisco LiveData Platform’s sophisticated data migration and replication capabilities on Platform Equinix with Equinix Fabric™ delivers a powerful digital infrastructure that enables replication of continuously changing data to the cloud and on-premises data centers with guaranteed consistency, no downtime and no business disruption. DConE, WANdisco’s distributed high-performance coordination engine, uses consensus to keep Hadoop and object store data accessible, accurate and consistent in different locations across any environment, with patent-protected active transaction replication software that moves data securely, at speed and at scale between computing environments. The solution enables rapidly changing Hadoop environments to exist in multiple locations while being read/writable and staying consistent in near real time without manual interaction.
WANdisco and Equinix: live data replication in the distributed environment
Together, WANdisco and Equinix enable enterprise companies to:
- Operate seamlessly in hybrid or multicloud environments, keeping on-premises, cloud and data environments continuously in sync, while keeping ownership of your data securely at the edge.
- Adopt scalable cloud storage and free your systems from storage hardware CAPEX by directly and securely interconnecting with the biggest cloud storage systems and systems integrators.
- Perform cloud migration without downtime, continue operating while migration is underway— even if the migration takes weeks or months.
- Ensure disaster prevention—fail fast and recover quickly—when you need to ensure business continuity and data SLA compliance across multiregional, hybrid on-premises and multicloud environments.
- Maintain a resilient, reliable, secure, scalable and cost-effective software-defined backbone with Equinix Fabric.
- Virtualize network management by decoupling the network control and forwarding functions, managing the physical network through software while the hardware manages the traffic.
- Streamline operational costs by removing licensing and operational support needs for VPN gateways.
Ready to dig deeper? Read the Equinix/WANdisco solution brief.