We highlight the following contributions from this dissertation:
We develop the concept of abstract source-level modeling and
the a-b-t notation for describing the source-level behavior of entire
traffic mixes. We identify a fundamental dichotomy in source-level behavior
between connections that exchange data sequentially and connections that exchange
data concurrently. Our a-b-t notation includes a sequential version and
a concurrent version that makes it possible to appropriately describe these
two types of behaviors.
We formulate a formal test of concurrency that can be applied to the packet
headers of any TCP connection, and that does not suffer from false positives.
This enables us to accurately classify connections as sequential or concurrent.
We show that only a small fraction of TCP connections (less than 4% in our traces)
exchange data concurrently, but that these TCP connections account for a substantial
fraction (up to 32%) of the total traffic.
We present an efficient algorithm for transforming a packet header
trace into a collection of sequential and concurrent a-b-t connection vectors.
Given a TCP connection for which we observe segments and
that has a maximum receiver window size of , the asymptotic cost of our algorithm
is . We demonstrate that this algorithm is accurate using traffic
generated from synthetic applications (i.e., with known characteristics).
We develop source-level trace replay, a closed-loop traffic generation method
that uses a-b-t connection vectors as a non-parametric model of network traffic.
One key benefit of this approach is the possibility of directly comparing original
and generated traffic, which we use to evaluate the ``realism'' of our traffic
generation approach. This comparison requires us to incorporate some
network-level parameters (round-trip times, maximum receiver window sizes, and
possibly loss rates) into the traffic generation. These parameters can be
measured from packet header traces. We pay special attention to passive round-trip
time estimation in our data acquisition, developing the concept of One-Side Transit
Time and studying the impact of delayed acknowledgments on passive round-trip
time estimation.
We implement our traffic generation method in a network testbed, developing a new distributed
traffic generation tool, tmix. We use this implementation to study the results of a large
collection of trace replay experiments, evaluating the need for detailed source-level
modeling and the impact of losses on measured network traffic. Our results demonstrate
that detailed source-level modeling is often required for accurately approximating real traffic,
which demonstrates that source-level behavior is a major factor shaping Internet traffic.
The most substantial differences are observed for the number of active connections
and the number of packet arrivals per unit of time. Byte arrivals per unit of time
and long-range dependence do not improve so consistently with the use of detailed source-level
modeling. We also show that losses had only a secondary effect in our traces, but they are not
negligible when comparing original and generated traffic.
We present two trace resampling algorithms which can be used to derive new
traces from an existing one, preserving its statistical characteristics at the source-level.
Our comparison of the two methods reveals that the observed long-range dependence in connection arrivals
has no apparent impact on the long-range dependence of packet and byte arrivals.
We demonstrate the need for byte-driven rather than connection-driven resampling
in order to accurately scale offered loads, and develop byte-driven versions of our two
resampling methods. This approach eliminates the need for the experimental calibration of
traffic generators (which study the relationship between the parameters of the generator
and the offered traffic load).
Our entire methodology makes it possible to conduct networking experiments
with closed-loop synthetic traffic derived from real traces in an automated manner. This eliminates the need for
painstaking parametric modeling.