By Jim Poole
What he was talking about was the way network SLAs fail to account for the kind of factors that determine the end-user experience. So while a network operator may have rigorously adhered to the terms of their SLA, delivering on target bandwidth and latencies throughout – that all amounts to very little if the customer experience or the application performance is not satisfactory.
Herr emphasized the point with an example of an operator in the US who had installed a 300 mbps Ethernet line to a customer’s office. The customer repeatedly complained that the iPerf results they were seeing averaged around 95 mbps. So the operator sent a top technician to investigate, who repeatedly ran tests that not only failed to identify the problem, but seemed to suggest that the line was performing at full capacity. $15,000 spent and still an unhappy customer – undermining the profitability of the line as well as failing to satisfy the customer.
What was going wrong here?
Herr suggested that most tests don’t look in the right places to solve modern problems. The most commonly used tests are more suited to the TDM era, and test layer2/3 functionality. The best known tests in the industry that fit into this category are RFC-2544 and ITU-TY.1564.
These remain an important first step of a troubleshooting process, but as more and more cloud applications are delivered over Ethernet, there will be more and more cases of seemingly ‘perfect’ network performance, but poor application performance. The common ‘solution’ is to blame the application – but often this isn’t the problem.
When testing networks that deliver applications to end-users, it’s important to look at the TCP layer where a lot of the problems occur.
Enter RFC 6349, a relatively new test that can identify application performance problems. This level of testing is beyond what is legally required in most cases, but if it can be sufficiently automated and made a standard part of procedure, a lot of savings can be made in customer technical support.
Without getting too technical, the basis of the test is to identify whether packets are being fragmented, and that the TCP window is of a sufficient size to allow maximum throughput. If you’re keen to explore this test in detail, additional information is available.
As for the operator who spent $15,000 diagnosing its customer’s problem? Its technician eventually got round to trying JDSU’s TrueSpeed tests, according to the RFC 6349, and discovered the fault was indeed down to TCP configuration on the operator’s hardware.