Objectives:
- Infer a sender's congestion window (CWND) by observing passive
TCP traces collected somewhere in the middle of the network.
- Estimate RTT (one estimate per window transmission) based on
estimate of CWND.
Motivation: Knowledge
of CWND and RTT helps determine many important characteristics of
sender, receiver, and the network path. For example:
- Non-backlogged (non-greedy) applications can be detected by
comparing amount
of data sent to CWND.
- Can identify non-conformant TCP as well as TCP version.
- Can characterize RTT variability within and across flows.
Methodology
Basic
Idea:
Construct a "replica" of the TCP sender's state for each TCP connection
observed at the measurement point. Emulate state transitions:
- by observing ACKs (which cause sender to change CWND).
- by detecting timeout events at sender (by observing
out-of-sequence retransmissions).
Challenges:
- Due to large volume of data, replica can not afford to backtrack
or reverse state transitions (work in a streaming fashion).
- Replica may not observe same sequence of packets as sender
- Some ACKs may not reach sender.
- Packets sent from sender may get reordered or duplicated.
- Sender modification of CWND depends on TCP version --- not known
at monitor.
- React differently on receipt of three duplicate ACKs.
- Implementation details and use of TCP options not visible at
monitor.
- Initial SSThresh and CWND.
- Use of SACK.
All of these introduce
uncertainties into CWND estimation.
Ideas used in meeting the
challenges:
- TCP Version and exact sequence of packets at sender not kow at
monitor.
- Emulate three versions: Tahoe, Reno, NewReno
- For each, if number of packets in flight is larger than
min(CWND, AdvWND), count it as a "violation".
- Infer sender's flavor as the one with the minimum number of
violations.
- If no violations in any flavor, conclude that "flavors are
indistinguishable".
- Implementation details and use of TCP options not visible at
monitor.
- Initial SSThresh set to an extremely large value
- "as it is commonly done in TCP stacks"
- Qn: Is that true?
- Initial CWND estimated by counting number of DATs seen before
the first ACK.
- Qn: What if some DATs get lost before they reach the monitor?
- SACK not dealt with.
RTT
estimation:
- Measure time difference between DAT packet and the DAT packet
that was immediately triggered by its ACK (which was immediately
triggered by the original DAT).
- Need CWND to determine which DAT was triggered by a particular
ACK.
- Stop RTT estimation during loss recovery (just like a TCP
sender).
- Caveats:
- Requires sender to be
greedy (stop RTT estimation when sender appears to be in a non-greedy
phase).
- RTT estimation accuracy closely
tied to CWND estimation accuracy
- Therefore, stop RTT
estimation when estimation uncertainty
detected in CWND.
Sources of
Estimation Uncertainty
Under-estimation
of CWND:
- Scenario:
- Monitor sees 3 duplicate ACKs, but sender does not.
- Monitor reduces CWND and SSThresh according to fast retransmit.
- Sender eventually times-out and retransmits packet.
- Monitor will detect timeout and reduce SSThresh again (twice)!
- Ambiguity:
- No way to know if sender also reduced SSThresh twice (packet
sent due to fast retransmit is lost and sender times out on it).
- Solution:
- If sender is greedy, it will transmit more packets than the
estimated CWND.
- This will lead to "violations" in *all* TCP flavors.
- The monitor can then identify connections for which the CWND is
uncertain.
- Statistics
- Only 0.01% of all senders incur violations in all the flavors.
- Among connection with at least 14 packets, 5% of senders incur
violations in all the flavors.
Over-estimation
of CWND:
- Ambiguous scenarios:
- ACKS lost after monitor
- Sender will not increment CWND in response to receiving this
ACK.
- During fast recovery, monitor will increment CWND by 1 for
every duplicate ACK seen, but sender may not.
- Entire window of data packets lost before the monitor
- Monitor will not be able to detect the loss.
- Sender will time-out and retransmit (and hence, also decrease
CWND and SSThresh). Monitor will not.
- Special-case: loss of SYN packet.
- Sender will set SSThresh to 2, whereas monitor will assume
a high value.
- No solution.
- Monitor will incorrectly infer the source to be
non-greedy.
- Statistics:
- Around 30% of senders were identified as being non-greedy.
Window-scaling:
- Ambiguous scenario:
- Since only 44 Bytes of headers collected, options not visible.
- If window-scaling option is used, scale factor for AdvWND not
visible.
- Solution:
- Scale factor is at least 1. So we have a lower bound on AdvWND.
- If CWND is smaller than this lower bound, you can compute
SendWND accurately.
- For connections in which CWND reaches the lower bound of AdvWND
even *once*, identify as "uncertain of sender window".
- Statistics:
- Around 2% such connections identified.
TCP
Implementation Issues:
- Scenarios:
- Implementation may not conform to standards (bugs).
- Some TCP initialize SSThresh to a cached value of CWND of a previous
connection to same destination.
- Monitor may initialize SSThresh to a much larger value.
- Solutions:
- Non-compliant stacks will trigger violations in all 3 state
machines.
- Over-estimation will result in sender being labelled as
non-greedy, which will stop RTT estimation.
Validation
Simulations:
- Setup:
- Average loss rates ranging from 2 - 4 %.
- Study 280 long-lived flows
- (not clear if any of
these were
non-greedy).
- Methodology:
- Derive the time series of estimated and observed values of RTT
and CWND.
- Compute the average of
the relative error in the two series.
- Analyze CDF of these per-connection mean relative errors.
- Results:
- For more than 95% senders, mean relative error in CWND is <
5%.
- For 90% senders, mean relative error in RTT is < 10%.
- Errors do not exceed 25% for any connection.
- Most TCP versions identified correctly.
- 5 missclassifications.
- Identification accuracy improves by increasing simulation
duration
- Higher probability that senders would exhibit behavior
peculiar to its flavor.
Internet
Experiments:
- Methodology:
- 200 TCP connections set up between PCs at UMass and at Sprint,
CA.
- Traces collected at the OC-3 access link in CA.
- Qn: How likely is it to see losses between sender and monitor?
- Experiments run at different times of day over several days.
- Bulk
transfers lasting
between 10 sec and 5 minutes.
- Experiments and Results:
- Loss rates less than 0.2% --- hence, accuracy was very high.
- Qn: Didn't test for TCP version detection...
- Simulated a dummynet bridge to emulate a bottleneck link with
loss rates of 3 - 5 %.
- For 95% senders, mean relative error in CWND is less than 5%.
- Absolute value of mean error
less than 0.5 packets.
- For 95% senders, mean relative error in RTT less than 15%.
- Corresponds to around 10-15 ms.
Conclusion:
Relative errors were small, often of the order (and possibly due to)
rounding errors or clock tick precisions.
Backbone
Measurement Results
Trace
Characteristics:
- 4 birectional OC-48 links
- East Coast, Transcontinental, Intra-POP.
- 1-hour long traces
- Total 11 million TCP connections
- 5,819 unique originating ASes
- 45% of total allocated ASes!
Congestion
Window:
- Maximum window size achieved during lifetime (computed according
to flow-sizes):
- For flows with no more than 5 packets,
- median value of max window = 8
- 80% flows have a max window < 11.
- As flow size increases, the median of max window converges to
10 - 12 packets.
- Probably because larger connections have higher probability
of incurring a loss, or being restricted by AdvWND.
- Spikes at max window size of 6, 12, 16 packets.
- Correspond to commonly-used AdvWND of 8 and 16 KB, with MSS
of either 536 or 1460 B.
- Although most connections show small max window sizes, most of
packets belong to connections with large max window sizes.
- Only 45% of packets are carried by flows with at least 5
packets.
- Throughput of only 4% connections limited by AdvWND at some
point in their lifetime.
- These account for 40% of paclets.
- For connections with at least 10, 25, 50 packets, 44%, 62%,
72% are limited by AdvWND.
TCP
Flavors:
- Consider only those senders that transmit more than 13
packets.
- 8% of all senders.
- 78% of data packets.
- 97% of these do not allow identification of a specific flavor.
- Since only 5% senders experience a triple duplicate ACK.
- Even after a fast retransmit, different behavior exhibited only
in presence of specific loss patterns.
- Only 2.95% of senders exhibit this behavior over their
lieftimes.
- These indistinguishable connections carry 67.5% of all packets.
- NewReno carries many more packets than Reno
Greedy
Senders:
- Basic Idea:
- Flight == set of packets sent "together" separated
from
subsequent flight by ACKs from the receiver.
- Greedy sender == if number of packets sent in a
flight is equal
to CWND at beginning of flight.
- Among senders that transmit more than 4 flights, 54% are greedy.
- Assumes that seeing DAT packets of a flight is never interleaved
with ACKs for that flight.
- Unrealistic if monitor is close to receiver.
- Define ACK-time ratio = (ACKtime-DATtime)/RTT.
- Fraction of greedy senders is largest in connections with
ACK-time ratio > 0.75.
- Statistics:
- 68% of senders are greedy for all flights
- 79% of senders are greedy for 75% of the flights transmitted.
- Representativeness:
- Senders in this category have same flow size distribution as
the set of all connections
Round-trip
Times:
- For 50% of senders, min RTT < 150-200 ms.
- 95-percentile/5-percentile RTT < 2 for 60% connections.
- 50% connections show RTT variability less than 200-300 ms.
Efficiency
of Slow-Start:
- SSWND: window just before exiting the initial slow-start phase.
- 25-55% of senders reach a max window that is more than twice the
SSWND.
- SSWND is supposed to be at most twice the available bandwidth.
- TCP connections are overly conservative!
Comments:
- Analysis valid for only long-lived flows?
- Since most error-correction heuristics rely on frequency of
occurence of errors...
- Requires access to traces of
both directions of connection
- Assumes symmetric
routing, or ready access to traces at access
links (which are likely to appear in the paths in both directions).
Hope
for the future:
ECN senders
explicitly encode the CWND in packet header (for receivers). Monitors
could just read this value off the packet headers. Currently, ECN not
widely deployed, though (< 0.14%).