Passive Estimation of Congestion Window: UMass

Objectives:

Infer a sender's congestion window (CWND) by observing passive TCP traces collected somewhere in the middle of the network.
Estimate RTT (one estimate per window transmission) based on estimate of CWND.

Motivation: Knowledge of CWND and RTT helps determine many important characteristics of sender, receiver, and the network path. For example:

Non-backlogged (non-greedy) applications can be detected by comparing amount of data sent to CWND.
Can identify non-conformant TCP as well as TCP version.
Can characterize RTT variability within and across flows.

Methodology

Basic Idea: Construct a "replica" of the TCP sender's state for each TCP connection observed at the measurement point. Emulate state transitions:

by observing ACKs (which cause sender to change CWND).
by detecting timeout events at sender (by observing out-of-sequence retransmissions).

Challenges:

Due to large volume of data, replica can not afford to backtrack or reverse state transitions (work in a streaming fashion).

Replica may not observe same sequence of packets as sender

Some ACKs may not reach sender.
Packets sent from sender may get reordered or duplicated.

Sender modification of CWND depends on TCP version --- not known at monitor.

React differently on receipt of three duplicate ACKs.

Implementation details and use of TCP options not visible at monitor.

Initial SSThresh and CWND.
Use of SACK.

All of these introduce uncertainties into CWND estimation.

Ideas used in meeting the challenges:

TCP Version and exact sequence of packets at sender not kow at monitor.

Emulate three versions: Tahoe, Reno, NewReno
For each, if number of packets in flight is larger than min(CWND, AdvWND), count it as a "violation".
Infer sender's flavor as the one with the minimum number of violations.

If no violations in any flavor, conclude that "flavors are indistinguishable".

Implementation details and use of TCP options not visible at monitor.

Initial SSThresh set to an extremely large value

"as it is commonly done in TCP stacks"
Qn: Is that true?

Initial CWND estimated by counting number of DATs seen before the first ACK.

Qn: What if some DATs get lost before they reach the monitor?

SACK not dealt with.

RTT estimation:

Measure time difference between DAT packet and the DAT packet that was immediately triggered by its ACK (which was immediately triggered by the original DAT).

Need CWND to determine which DAT was triggered by a particular ACK.
Stop RTT estimation during loss recovery (just like a TCP sender).

Caveats:

Requires sender to be greedy (stop RTT estimation when sender appears to be in a non-greedy phase).

RTT estimation accuracy closely tied to CWND estimation accuracy

Therefore, stop RTT estimation when estimation uncertainty detected in CWND.

Sources of Estimation Uncertainty

Under-estimation of CWND:

Scenario:

Monitor sees 3 duplicate ACKs, but sender does not.
Monitor reduces CWND and SSThresh according to fast retransmit.
Sender eventually times-out and retransmits packet.
Monitor will detect timeout and reduce SSThresh again (twice)!

Ambiguity:

No way to know if sender also reduced SSThresh twice (packet sent due to fast retransmit is lost and sender times out on it).

Solution:

If sender is greedy, it will transmit more packets than the estimated CWND.
This will lead to "violations" in *all* TCP flavors.

Qn: Is that true?

The monitor can then identify connections for which the CWND is uncertain.

Statistics

Only 0.01% of all senders incur violations in all the flavors.
Among connection with at least 14 packets, 5% of senders incur violations in all the flavors.

Over-estimation of CWND:

Ambiguous scenarios:

ACKS lost after monitor

Sender will not increment CWND in response to receiving this ACK.
During fast recovery, monitor will increment CWND by 1 for every duplicate ACK seen, but sender may not.

Entire window of data packets lost before the monitor

Monitor will not be able to detect the loss.
Sender will time-out and retransmit (and hence, also decrease CWND and SSThresh). Monitor will not.
Special-case: loss of SYN packet.

Sender will set SSThresh to 2, whereas monitor will assume a high value.

No solution.

Monitor will incorrectly infer the source to be non-greedy.

Statistics:

Around 30% of senders were identified as being non-greedy.

Window-scaling:

Ambiguous scenario:

Since only 44 Bytes of headers collected, options not visible.
If window-scaling option is used, scale factor for AdvWND not visible.

Solution:

Scale factor is at least 1. So we have a lower bound on AdvWND.
If CWND is smaller than this lower bound, you can compute SendWND accurately.
For connections in which CWND reaches the lower bound of AdvWND even *once*, identify as "uncertain of sender window".

Statistics:

Around 2% such connections identified.

TCP Implementation Issues:

Scenarios:

Implementation may not conform to standards (bugs).

Some TCP initialize SSThresh to a cached value of CWND of a previous connection to same destination.

Monitor may initialize SSThresh to a much larger value.

Solutions:

Non-compliant stacks will trigger violations in all 3 state machines.
Over-estimation will result in sender being labelled as non-greedy, which will stop RTT estimation.

Validation

Simulations:

Setup:

Average loss rates ranging from 2 - 4 %.
Study 280 long-lived flows

(not clear if any of these were non-greedy).

Methodology:

Derive the time series of estimated and observed values of RTT and CWND.
Compute the average of the relative error in the two series.
Analyze CDF of these per-connection mean relative errors.

Results:

For more than 95% senders, mean relative error in CWND is < 5%.
For 90% senders, mean relative error in RTT is < 10%.
Errors do not exceed 25% for any connection.
Most TCP versions identified correctly.

5 missclassifications.
Identification accuracy improves by increasing simulation duration

Higher probability that senders would exhibit behavior peculiar to its flavor.

Internet Experiments:

Methodology:

200 TCP connections set up between PCs at UMass and at Sprint, CA.
Traces collected at the OC-3 access link in CA.

Qn: How likely is it to see losses between sender and monitor?

Experiments run at different times of day over several days.
Bulk transfers lasting between 10 sec and 5 minutes.

Experiments and Results:

Loss rates less than 0.2% --- hence, accuracy was very high.

Qn: Didn't test for TCP version detection...

Simulated a dummynet bridge to emulate a bottleneck link with loss rates of 3 - 5 %.

For 95% senders, mean relative error in CWND is less than 5%.

Absolute value of mean error less than 0.5 packets.

For 95% senders, mean relative error in RTT less than 15%.

Corresponds to around 10-15 ms.

Conclusion:

Relative errors were small, often of the order (and possibly due to) rounding errors or clock tick precisions.

Backbone Measurement Results

Trace Characteristics:

4 birectional OC-48 links

East Coast, Transcontinental, Intra-POP.

1-hour long traces

Total 11 million TCP connections
5,819 unique originating ASes

45% of total allocated ASes!

Congestion Window:

Maximum window size achieved during lifetime (computed according to flow-sizes):

For flows with no more than 5 packets,

median value of max window = 8
80% flows have a max window < 11.

As flow size increases, the median of max window converges to 10 - 12 packets.

Probably because larger connections have higher probability of incurring a loss, or being restricted by AdvWND.

Spikes at max window size of 6, 12, 16 packets.

Correspond to commonly-used AdvWND of 8 and 16 KB, with MSS of either 536 or 1460 B.

Although most connections show small max window sizes, most of packets belong to connections with large max window sizes.

Only 45% of packets are carried by flows with at least 5 packets.

Throughput of only 4% connections limited by AdvWND at some point in their lifetime.

These account for 40% of paclets.
For connections with at least 10, 25, 50 packets, 44%, 62%, 72% are limited by AdvWND.

TCP Flavors:

Consider only those senders that transmit more than 13 packets.

8% of all senders.
78% of data packets.

97% of these do not allow identification of a specific flavor.

Since only 5% senders experience a triple duplicate ACK.
Even after a fast retransmit, different behavior exhibited only in presence of specific loss patterns.

Only 2.95% of senders exhibit this behavior over their lieftimes.

These indistinguishable connections carry 67.5% of all packets.

NewReno carries many more packets than Reno

Explanation not clear...

Greedy Senders:

Basic Idea:

Flight == set of packets sent "together" separated from subsequent flight by ACKs from the receiver.

Greedy sender == if number of packets sent in a flight is equal to CWND at beginning of flight.

Among senders that transmit more than 4 flights, 54% are greedy.

Assumes that seeing DAT packets of a flight is never interleaved with ACKs for that flight.

Unrealistic if monitor is close to receiver.
Define ACK-time ratio = (ACKtime-DATtime)/RTT.

Fraction of greedy senders is largest in connections with ACK-time ratio > 0.75.

Statistics:

68% of senders are greedy for all flights

79% of senders are greedy for 75% of the flights transmitted.

Representativeness:

Senders in this category have same flow size distribution as the set of all connections

Round-trip Times:

For 50% of senders, min RTT < 150-200 ms.

95-percentile/5-percentile RTT < 2 for 60% connections.

50% connections show RTT variability less than 200-300 ms.

Efficiency of Slow-Start:

SSWND: window just before exiting the initial slow-start phase.

25-55% of senders reach a max window that is more than twice the SSWND.

SSWND is supposed to be at most twice the available bandwidth.

TCP connections are overly conservative!

Comments:

Analysis valid for only long-lived flows?

Since most error-correction heuristics rely on frequency of occurence of errors...

Requires access to traces of both directions of connection

Assumes symmetric routing, or ready access to traces at access links (which are likely to appear in the paths in both directions).

Hope for the future:

ECN senders explicitly encode the CWND in packet header (for receivers). Monitors could just read this value off the packet headers. Currently, ECN not widely deployed, though (< 0.14%).