Our abstract source-level modeling of TCP connection provides a solid foundation
for generation traffic mixes in simulators and network testbeds. We propose to
generate traffic using source-level trace replay, as illustrated in
Figure 1.4.
Given a packet header trace collected from some Internet link, we first use our
data acquisition algorithm to analyze the trace and describe its content
as a collection of connection vectors
, where
is the relative
start time of the
-th TCP connection, and
is the sequential or concurrent
connection vector corresponding to this connection.
The basic approach for generating traffic according to
is to replay
every connection vector
. Each connection vector
is replayed
by starting a TCP connection precisely at
's relative start time
, and
transmitting the measured sequence of ADUs (
and
) separated in time by the
inter-ADU measured quiet times (
and
).
In this dissertation, we evaluate a specific implementation of this approach for FreeBSD
network testbeds, where traffic is generated using a tool we developed called tmix .
The goal of the direct source-level trace replay of is to reproduce
the source-level characteristics of the traffic in the original link, generating
the traffic in a closed-loop fashion. Closed-loop traffic generation implies
the need to simulate the behavior of applications, using regular network stacks to actually
translate source-level behavior into network traffic.
In particular, our experiments use an implementation which relies on the standard socket
interface to reproduce the data exchanges in each connection vector. Generating
traffic in this manner is closed-loop in the sense that it preserves the feedback
mechanism in TCP, which adapts its behavior to changes in network conditions,
such as loss and receiver saturation. In contrast,
packet-level trace replay, the direct reproduction of
, is an
open-loop traffic generation method in the sense that TCP control algorithms are not
used during the generation, and hence the traffic does not adapt to
network conditions.
The evaluation of our methodology consists of comparing the original trace
and the synthetic trace
obtained from the source-level trace replay.
Validating our traffic generation method consists of transforming
into a
set of connection vectors
, using the same method used to transform
into
.
We then compare the resulting set of connection vectors
with the original
. In principle, they should be identical, since
represents the
invariant source-level characteristics of
. There are however some
differences that are explained by the nature of the model and our measurement methods.
The direct comparison of and
also provides a way to study the
accuracy of our approach in terms of how well traffic is described by the a-b-t model.
This is however a subtle exercise.
The actual replay of
, which creates
, necessarily requires the selection of a
a set of network-level parameters, such as round-trip times and TCP receiver window sizes,
for each TCP connection in the source-level trace replay.
The exact set of generated TCP segments and their arrival times is a direct function of these parameters.
As a consequence, if we conduct a source-level trace replay using
arbitrary network-level parameters, we obtain a
with little
resemblance to the original
. The replayed a-b-t connection vectors
may be a perfect description of the source behavior driving the original connections,
but the generated packet-level trace
would still be very different from the original
.
To address this difficulty, our replay incorporates network-level
parameters individually derived from each connection in
.
We have also incorporated methods for measuring three important network-level
parameters (round-trip time, TCP receiver window size and loss rate) into our
analysis and generation procedure. While this set of parameters is by no means complete,
it does include the main parameters that affect the average throughput of a TCP
connection found in a trace. This enables us to generate traffic in a closed-loop
manner that approximates measured traces very closely.
Incorporating network-level properties is important, but it is
critical to understand the main shortcoming of this approach.
The goal of our work is not to make the generated traffic
identical
to the original traffic
, which could be accomplished with a simple
packet-level replay. As mentioned before, packet-level replays generate
traffic that does not adapt to changes in network conditions, resulting
in open-loop traffic.
Our goal is to develop a closed-loop traffic generation method based
on a detailed characterization of source behavior. Traffic generated in a
closed-loop manner can adapt to different network conditions, which are
intrinsic when evaluating different network mechanisms.
Our comparison of
and
is only a means to understand
the quality of traffic generation method, where quality is considered
to be higher as the original trace is more closely approximated.
If enough parameters of the original traffic are accurately measured
and incorporated into the traffic generation experiment, we expect to observe
a great similarity between
and
. On the contrary, if
we are missing some important parameters, we expect to observe substantial
differences between traces.
By construction, traffic generated using source-level trace replay
can never be identical to the original traffic.
The statistical properties of original packet header traces are the
result of multiplexing a large number of connections onto a single link,
and these connections traverse a large number of different paths with a
variety of network conditions.
It is simply not possible to fully characterize this environment and
reproduce it in a laboratory testbed or in a simulation.
This is both because of the limitations of passive inference from packet
headers, and because of the stochastic nature of network traffic.
Source-level trace replay can never incorporate every factor that shaped
, and therefore differences between
and
are
unavoidable.
Still, finding a close match between an original trace and its replay,
even if they are not identical, constitutes strong evidence of the accuracy of the
a-b-t model and the data acquisition and generation methods we have developed.
It also demonstrates the feasibility of generating realistic network traffic
in a closed-loop manner that resembles a rich traffic mix.
Doctoral Dissertation: Generation and Validation of Empirically-Derived TCP Application Workloads
© 2006 Félix Hernández-Campos