Passive TCP Performance Analysis

Research | Past Research Projects | Prototypes

In the last two decades the Internet has changed a lot in terms of its bandwidth, types of applications and number of users, however, what has not changed is the fact that TCP is still the most popular transport protocol, accounting for more than 80% of the Internet traffic. Given TCP's wide-spread usage, its performance fundamentally impacts the performance of Internet transfers. Packet losses are known to affect TCP performance by impacting two important aspects of TCP - reliable delivery and congestion control. However, what is not well understood is exactly how much they affect TCP performance. The extent to which losses affect TCP performance depends on how well TCP deals with these losses. Current analysis techniques for TCP lack the level of details or accuracy required to illuminate the above issues. To achieve the required level of details and accuracy, We have developed a new analysis tool and have used it to study two different interactions between TCP and losses -- (i) TCP's efficiency in detecting packet losses when they occur and (ii) TCP's efficiency in avoiding losses when possible. We describe both our methodology and analysis below.

Methodology

TCP is a classic example of a legacy protocol that gets subject to modifications. Unfortunately, evaluation of something as fundamental as TCP's loss detection/recovery mechanism is not comprehensive. Our aim is to perform a complete realistic evaluation of TCP losses and its impact on TCP performance. I rely on passive analysis of real-world TCP connections to achieve the required level of detail and realism in my analysis. Detailed analysis of passive traces is difficult; the analyzer has no idea of the state of the connection when it was captured. Differences in TCP implementations in different OSes make it difficult to keep track of these states. Losses and delays before the monitor make a bad situation even worse. We address these issues in our tool, TCP Debug , by emulating the TCP stack in five different OSes (Windows, Linux, Solaris, FreeBSD, and MacOS) and maintaining additional connection state. Using our tool we study more than 53 million connection collected at 5 different locations from all over the globe.

Analysis

Analysis of TCP loss detection mechanisms

TCP relies on two basic loss detection mechanisms -- retransmission timeouts (RTO) and fast retransmit and recovery (FR/R). Two performance-related goals -accuracy and timeliness -- guide the design of these detection mechanisms. Unfortunately, these two goals conflict with each other. A "quick" inference of segment loss would also be erroneous when segments (or their acks) are not lost but merely delayed or reordered in the network. To achieve high loss-estimation accuracy, therefore, TCP would necessarily have to wait longer for acks that may merely be delayed affecting its timeliness. This fundamental tradeoff between accuracy and timeliness is controlled by several design parameters associated with RTO and FR/R based loss detection - these include the dup-ACK threshold, the min RTO, the RTT-smoothing factor, the weight of RTT variability in the RTO-estimator, and the RTO estimator algorithm itself . Different OSes differ in these settings. Unfortunately, the effects of current parameter setting in different OSes or the optimal settings for these are not known. Our objective is to (i) understand the loss detection performance of current TCP deployments, (ii) evaluate the impact of parameters associated with TCP loss detection, and (iii) identify the best parameters.

Ability to avoid losses using Delay signals

In order to avoid the heavy penalty associated with packet losses, several studies have looked at use of alternate network signals like delay to detect congestion. These schemes rely on delay based congestion estimators (DBCEs) that assume that during periods of congestion, the packets of a connection would experience higher than normal queuing delays at the congested link -- this should translate to an increase in packet round-trip times (RTTs). So by sampling per-packet RTTs, and comparing them to a base RTT (measured in the absence of congestion), a DBCE infers the onset as well as alleviation of congestion. More relevantly, a DBCE expects to avoid most packet losses by doing so. Specifically, most DBCE evaluation assumes that delay is "always" an indicator of congestion. However, this assumption may be overly restrictive given that the end-to-end delay signal may be too noisy due to factors such as queuing at multiple routers, ack compression, and insufficient sampling of information. The extent of the reliability of the delay signal and its relationship with network and connection characteristics in the real Internet needs to be understood to correctly interpret this signal. We systematically evaluated the ability of different DBCEs to predict losses and to study how different connection characteristics affect this ability.

Results

Using TCP debug we analyzed a large number of diverse traces to provide a detailed view of current state of losses in the Internet. We find that a large percentage of connection do not experience any loss at all, for connection experiencing losses the main detection mechanism used to detect these losses is RTO and not FR/R. The main cause of prevalence of RTO is the small number of segments a connection has in flight in the network at a time which prevents the receiver from generating enough duplicate acks to trigger FR/R based loss detection. We also found that the amount of reordering in the network which can cause unnecessary FR/R was seen to be as high as 14% of all out-of-order segments in some traces. Finally, we find that a significant number (3.7-19%) of all retransmissions are unnecessary as the packets are not actually lost.

We performed detailed analysis of TCP's loss detection mechanism using TCP debug . We first analyze the performance of current detection mechanism and then investigated the impact of changing the default parameters for these mechanisms on their performance. To understand the impact on overall performance using only passive analysis we developed detailed analytical models, which predicted the change in performance under different circumstances. Based on these analyses we conclude that

Most of current implementations of RTO estimators are conservative in incorporating variability in TCP RTT. Making these more aggressive improves performance for a large number of connections.
Making the dupack threshold adaptive improves overall performance of the connections.
Unlike in the past, timer granularity and the minimum RTO no longer significantly limit connection performance.
The Linux RTO estimator converges fast and is the most efficient. If properly configured, this estimator has the greatest potential for improving connection durations.

We evaluated the ability of several prominent DBCEs to predict losses over a large number of connections. We developed analytical models to predict the impact of these predictions/mis-predictions on the connection duration using passive analysis. We find that

CIM is the overall best estimator. it is likely to reduce the duration of large connections significantly, though at the possible expense of small connection.
The estimator used by the prominent Vegas protocol is fairly conservative. It has almost no impact on the performance of TCP connections that do not transmit large flights of segments.

Finally, we study the influence of connection characteristics on the performance of DBCEs. We find that connections with a high throughput and large flight sizes are likely to benefit the most from any DBCE. Connections which have very few packets in flight are least likely to see any improvement in their performance.

Please refer to our publications for more details.

Tool

The purpose of the tool is to provide more complete and accurate results for identifying and characterizing out-of-sequence segments than those provided by prior tools such as tcpanaly, tcpflows, LEAST, and Mystery. Our methodology classifies each segment that appears out-of-sequence (OOS) in a packet trace into one of the following categories: network reordering or TCP retransmission triggered by one of timeout, duplicate ACKs, partial ACKs, selective ACKs, or implicit recovery. Further, each retransmission is also assessed for whether it was needed or not.

One of the crucial factors that limits the accuracy of prior tools is that different TCP implementations (for different operating systems) have unique parameters (e.g., timer granularity, minimum RTO, duplicate ACK thresholds, etc.) or algorithms that influence what can be inferred about out-of-sequence segments. Our approach is to analyze each TCP segment trace from the perspective of each of four implementations (Linux, Windows, FreeBSD/MacOS-X, and Solaris) and determine which specific implementation behavior best explains the out-of-sequence segments and timings observed in the trace.

We validate our tool through several controlled experiments with instances of all four OS-specific implementations used in the analysis. We then run this tool on packet traces of 52 million Internet TCP connections collected from 5 different locations and present the results including comparisons with results from running selected
prior tools on the same traces.

Given that prior tools have been shown to provide reasonably good results, one might question whether the additional completeness and accuracy justifies creating a new tool. We believe that they do so for the following reasons. First, each of these prior tools has particular strengths and weaknesses for analyzing some aspect(s) of out-of-sequence segments but none deal with all aspects at the desired level of accuracy. Second, a number of potential uses for the analysis results are much enhanced when they are are accurate. For example, while the TCP loss detection and recovery mechanisms are quite mature and unlikely to undergo major design changes, there may still be opportunities for "fine-tuning" to improve certain cases. Prior studies have indicated (and our analysis in this paper has substantiated) that retransmissions are triggered much more frequently by timeouts than by duplicate ACKs, and that significant numbers of retransmissions are unnecessary. Having accurate data on issues such as these is necessary for quantifying the potential benefits of fine-tuning these TCP mechanisms. Another example where accurate results from
analysis of out-of-sequence segments are needed is in validating and evaluating models of TCP performance; such models are based on the evolution of TCP's congestion window as it changes along with retransmissions, and according to how the need for a retransmission was detected (timeout or duplicate ACKs).. An inaccurate classification of such retransmission can mislead such evaluations.

We have released the tool online. The documentation for using the tool and the tool itself is located here.

Publications

"A Performance Study of Loss Detection/Recovery in Real-world TCP Implementations" , S. Rewaskar, J. Kaur and FD. Smith, In Proceedings of 15th IEEE International Conference on Network Protocols (ICNP,07)
"Accuracy of Probing Techniques in Estimating TCP Loss Rates", S. Rewaskar, J. Kaur and FD. Smith, ACM SIGCOMM'06, Pisa, Italy, September 2006. (Extended Abstract)
"A Passive State-Machine Approach for Accurate Analysis of TCP Out-of-Sequence Segments", S. Rewaskar, J. Kaur and FD. Smith, In the ACM SIGCOMM Computer Communication Review, July 2006. (Also published as tech report TR06-002 )
"A Passive State-Machine Based Approach for Reliable Estimation of TCP Losses", S. Rewaskar and J. Kaur, in Proceedings of Passive and Active Measurement Conference 2006, Adelaide, Australia, March 30-31, 2006 (Extended Abstract)

Invited Posters

"Variability in TCP Round-trip Times", S. Rewaskar, J. Aikat, J. Kaur, D. Smith, D. Pozefsky and K. Jeffay, IBM University Day at IBM RTP, March 2004
"Variability in TCP Round-trip Times", S. Rewaskar, J. Aikat, J. Kaur, D. Smith, D. Pozefsky and K. Jeffay, in the SAMSI Workshop on Congestion Control and Heavy Traffic Modeling , November 2003.

Technical report

"A Passive State-Machine Based Approach for Reliable Estimation of TCP Losses", S. Rewaskar, J. Kaur, D. Smith, Technical Report TR06-002, Department of Computer Science, UNC Chapel Hill, Nov 2005
"Why Don't Delay-based Congestion Estimators Work in the Real-world?", S. Rewaskar, J. Kaur, D. Smith, Technical Report TR06-001, Department of Computer Science, UNC Chapel Hill, July 2005
"Empirical Analysis of TCP Losses and Its Detection/Recovery Mechanisms", S. Rewaskar, J. Kaur, D. Smith, Technical Report TR05-017, Department of Computer Science, UNC Chapel Hill, May 2005

Under Submission

"A Performance Study of Delay Based Congestion Estimators in Real-world TCP Connections", S. Rewaskar, J. Kaur and FD. Smith

Collaborators

Don Smith
Kevin Jeffay
Steve Marron

Other links

TBIT, the TCP Behavior Inference Tool
tcpflows

Jasleen Kaur

Passive TCP Performance Analysis

Research | Past Research Projects | Prototypes