This chapter presented our traffic generation method, source-level trace replay.
The first step in source-level trace replay is to transform a packet header trace into a
set of connection vectors, which describe its source-level behavior using
the sequential or the concurrent version of the a-b-t model. Connection
vectors also include three network-level parameters, round-trip time, TCP receiver
window size and loss rate. The actual traffic generation consists of replaying the
characteristics of each connection vector in an accurate manner. We demonstrated
the possibility of this approach using an implementation in a network testbed, which includes a distributed
traffic generator, tmix, that can replay source-level behavior, and coordinate
with a packet manipulation layer, usernet, to impose specific round-trip
times and loss rates to each connection. The approach, and its implementation, was
then validated by comparing the statistical characteristics of three traces
and those of their replays. This comparison focused on how well the replay preserved
the original parameters, i.e., the source-level description and the network-level
The validation results showed a good match between original traces and their replays,
which confirms the highly accurate reproduction of source-level properties
that can be achieved with our approach.
The differences, which are shown to be small or nonexistent in every case, are due
to the following causes:
no guarantee that the replay of a concurrent connection exhibits measurable concurrency,
i.e., that a pair of concurrent data segments can be observed in the generated trace.
This results in connections that are replayed as concurrent but classified as sequential
, therefore adding
spurious samples to the characterization of sequential connections, and removing samples
from the characterization of concurrent connections. In general, this affects the
comparison of concurrent connections more substantially, since the number of samples from
concurrent connections is usually far smaller. This problem is inherent to the form
of the concurrent a-b-t model used in this dissertation.
Our measurement of quiet times
tended to overestimate their durations, since it did not compensate for the delay
between the end host and the monitor. This difference is only significant for the smallest
quiet times, whose magnitude is similar to that of network delays.
A possible refinement of our measurement method that would eliminate the overestimation of quiet times
and make the replay of quiet times even more accurate, is to subtract the corresponding
one-side transit time from each measured quiet time.
Usernetuses independent dropping to simulate losses, and this is not completely accurate.
often have too few packets to converge to the intended loss rate per connection.
If loss rates per byte are considered, the replay is shown to be
very close to the original distribution. Achieving a close approximation of the
original loss rate would involve some form of dependent dropping.
Measured drop rates consider
only data segments, but the loss rate simulation also drops pure acknowledgments
with the same probability. This makes the distributions of loss rates in the lossy replays slightly above
the intended values. Addressing this inaccuracy requires developing a measurement
algorithm that can determine the loss rate of pure acknowledgments,
which seems rather difficult, or modifying usernetto drop only data segments,
which is a somewhat artificial solution.
The analysis of the validation results also served us to verify the robustness of
our data acquisition and generation method to the introduction of losses with
regard to the source-level characteristics. We found very little
difference, if any, between the results from the lossless and lossy replays, which
confirms the accuracy of the analysis even in the face of packet
losses and reordering. TCP timeouts, which can sometimes
confuse the heuristic used to split ADUs in the same direction,
do not appear to have any significant effect.