Latency: Is Slow the New Down?

Stefan Raab
Latency: Is Slow the New Down?

“Slow is the new down” is the catchphrase that seems to be gaining momentum in this era of cloud ubiquity. Since it’s extremely rare for a cloud service to go down completely, the phrase refers to customers’ biggest cloud complaint – slow service. The good news is, we’ve learned how to build highly resilient applications with distributed cloud-native architectures. For example, with over a billion users, Google’s Gmail service has grown to be an industry leader because its strong feature set is always available. The leading cloud providers have set a standard for instant, reliable services that are delivered to any device, anytime, anywhere. Users now expect the same level of performance from all cloud-based applications.

The anatomy of slow

For distributed application architectures, especially those leveraging the cloud, performance issues can be hard to track down because it’s not easy to observe all the components in the system simultaneously. Performance issues are typically diagnosed from the application perspective and fall into two big buckets: storage and network. And because a significant share of data traffic is handled from network-attached storage devices, most of the fingers end up pointing at the network team.

Slowness in the network is called latency, and there are some immutable properties of physics that make it a common target for most application performance problems. Even the speed of light isn’t fast enough when your data is making tens or hundreds of trips back and forth across the country. Unfortunately, today’s networks don’t even operate at the speed of light. Light moves 30% slower in fiber optic cables, which are used in many of the world’s networks, and that’s just the beginning. Other factors that introduce latency include cable paths, intermediate network devices (routers, switches, etc.) and device design. When latency is low, operating system or hypervisor tuning can also have a big impact. For example, it’s not uncommon for a Linux firewall to add 0.5 milliseconds (ms) or more.

One problem that comes into play with network design is known as buffer bloat. In IP networks like the internet, information is broken up into small chunks known as packets. The packets are sent through the network one by one, and then the information is rebuilt on the other side. When network links get full, the devices in the middle can either throw away individual packets or they can hold them in memory on the assumption that it’s a temporary spike in traffic. This works well if the spikes are truly short, but when too many packets get held, significant amounts of latency can be introduced.

It’s important to remember that the internet is a shared network, so any given link in the path between two hosts may be carrying hundreds or thousands of sessions. Even between two hosts there may be many separate sessions that make up the application communication. Each of these sessions is completely unaware of the other sessions that share links in the path, so as traffic ebbs and flows along the link, sessions must adapt to the various traffic conditions. Think about the link from an office to the internet. You may need to download a large file, and during that time one co-worker may send or receive multiple emails with large attachments and another may be scrolling through social media, downloading images or watching videos. All the while, your download needs to adjust to the constantly changing available network bandwidth. When the network is quiet, it will go as fast as possible, but when the network is busy, the available bandwidth is shared between everyone.

The TCP impact

The predominant connection layer protocol used in IP networks, TCP, is designed to detect congestion and slow the communication to match the available end-to-end bandwidth. Congestion detection works by watching for dropped packets (the ones that don’t reach their destination) and then adjusting the flow rate. When packets are buffered for a long time, hosts assume they’re far away from each other and not that there’s a bandwidth constraint in the middle, so they keep sending the traffic along, which can compound the congestion in the buffers.

TCP sessions also run into what’s known as the bandwidth-delay product issue, or as latency increases, the total throughput on the link decreases. When the delay comes from buffering, it ends up lowering the throughput on congested links, but not as effectively as packet loss does. Both of these issues can be managed, but the right settings are not typically well-known. As a result, devices end up being tuned for speeds that were as fast as when the host operating system was released, which may not be optimal for the current traffic conditions and, as we previously mentioned, can limit throughput as latency increases.

How fast is fast?

Most of us are aware of the dramatic improvement in network speeds over the past decade, but few have followed the changes in end-to-end latency and the associated round-trip time (RTT), also known as “ping time,” it inserts into a network. Gamers are known to check their ping times to ensure great performance, but their requirement for low latency is nothing compared with that of high frequency traders who measure latency in nanoseconds. The most dramatic changes in latency have come in wireless networks. In the 3G era, it was common to see round-trip latency near 500ms, or 1/2 of a second! At the beginning of the 4G/LTE era, latency varied widely between 50ms and 200ms. Once wireless trends tracker OpenSignal and the news site Fierce Wireless started publishing the disparity, all the major operators drove aggressive programs to bring latency averages down to the 60ms range. To compare the demands of these and other applications, the chart below shows the TCP delay bandwidth product, with markers highlighting some of the key latency requirements.

Latency in the enterprise

If slow is the new down, and we know that latency can impact not only performance but throughput as well, here are the key things that enterprise architects should be considering:

  1. Proximity matters – Physics won’t change, so it’s important to make sure that interconnected systems live as close as possible to each other. This means knowing where applications live and what the network paths are between sites.
  2. Avoid congestion – Congestion in uncontrolled environments can have broad impacts, and the best way to avoid congestion is to control as much of the network path as possible.
  3. Latency impacts cloud performance – When performance was measured at the desktop and servers lived in a corporate data center in the basement, latency was easy to manage. Today, applications live in the cloud and performance is measured on every device.

Addressing latency challenges

Latency may not be a total silent killer, but it’s definitely a quiet buzzkill to many digital businesses. At Equinix, our 200 highly interconnected data centers in more than 52 global markets can put applications, clouds and networks at the center of the action. And in many of today’s businesses, that action is in distributed locations out at the digital edge, where commerce, population centers and digital ecosystems meet.

We’ve also published the second volume of the annual Global Interconnection Index (the GXI), an analysis of global traffic exchange and projections of Interconnection Bandwidth capacity growth between 2017 and 2021. The GXI can show you how to leverage Interconnection to address network latency and bandwidth issues. With multiple interconnection options to clouds and application services that can be delivered in minutes, rather than days or weeks, we can help you architect network infrastructures that will meet your definition of fast today and tomorrow.

Learn more by reading the Global Interconnection Index Volume 2.

Subscribe to the Equinix Blog