Validation of Source-level Trace Replay

In this section, we consider the source-level trace replay of the three packet header traces: Leipzig-II, UNC 1 PM, and Abilene-I. The first goal is to study how well the replay experiments preserve the source-level input, which is the collection of connection vectors extracted from the original trace . In principle, the characterization of source-level behavior using the a-b-t model represents characteristics of each connection that are invariant to network conditions, so the analysis of the generated trace should result in a collection of connections vectors that is identical to . In practice, there are some practical limitations that make the two sets of connection vectors different. We will discuss the possible causes in this section, and present a statistical comparison of and .

The second goal of this section is to study the impact of introducing packet
losses in the generated process. For this purpose, we conducted two source-level
trace replays of each original trace. The *lossless replay* reproduced the
a-b-t connection vector of each original connection, and gave each connection its measured
round-trip time and TCP receiver window sizes.
The *lossy replay* additionally applied its measured
loss rate to each replayed connection. Differences between the lossless and lossy replays tell us about
the robustness of both our source-level characterization and traffic generation tools
in the presence of losses. These losses are completely absent from our experiments unless
they are artificially introduced using *usernet* , as in the lossy replay.

Leipzig-II

The plots in Figure 5.4 compare the distributions of a-type ADU sizes, , for the original set of connection vectors in Leipzig-II, and for the sets of connection vectors extracted from its lossless and lossy replays. In each plot, the three distributions marked with white symbols correspond to sequential connection vectors, and the ones marked with black symbols to concurrent connection vectors. The left plot shows the bodies of the distributions, using CDFs in log-linear axes. The right plot shows the tails of the distributions, using CCDFs in log-log axes. In general, there is an excellent agreement between the original distributions and those from the source-level replays.

The bodies of the distributions from sequential connections lie on top of each
other, even if per-connection loss rates are used during the experiments.
As discussed in 3.4, our ADU measurement algorithm can sometimes be
inaccurate when one of the last segments of a TCP window is lost before the monitor.
In this case, the loss is recovered after a timeout, which can create a quiet
time between the consecutive segment
that is long enough to unnecessarily split an ADU. This means that a sample from one of the
a-type data units in becomes two samples and
in
, such that
.
The validation of the data acquisition methods in Section 3.4 demonstrated
that *ADU splitting due to TCP timeouts* is possible, although its impact was small
even when large data units and aggressive loss rates were used.
The comparison of the Leipzig-II lossless and lossy replays, which represent much more realistic traffic,
shows that ADU splitting due to TCP timeouts has very little impact in practice, at least for the
relatively light distribution of loss rates in Leipzig-II.
We can hardly observe any difference between the bodies of the
distributions when losses are added to the replay. The two bodies from the replay
are also very similar to the body of the original distribution.
The same is true for the tails, which do not show any significant difference.
This analysis demonstrates that *tmix* can accurately reproduce the sizes of
a-type data units in sequential connections, even when ADUs are large and
when experiments are lossy.

There is also a very good match between the distributions for concurrent connection vectors. In some regions, we notice somewhat thicker lines that come from small offsets of the curves. The tails of the distribution for concurrent connections are also very similar, although the one from the lossy replay is slightly heavier for values below 5 MB, and slightly lighter for values above that. This could be explained by the inaccuracy discussed above, or by trace boundaries. In the latter case, losses reduce throughput, making the replay of lossy connections are slower than the replay of lossless ones. This means that some a-type ADUs may not have time to complete their transmission before the end of the experiment.

Figure 5.5 compares the distribution of b-type ADU sizes,
, for the connections vectors extracted from the original Leipzig-II
trace and their lossless and lossy source-level replays.
For sequential connection vectors, both the bodies and the tails are identical.
For concurrent connection vectors, the distributions show slightly different bodies,
but identical tails. The differences cannot be explained by the ADU splitting due to TCP timeouts.
If so, we would see a difference between the distributions
from the lossless replay and the ones from the lossy replay, but this is not the case.
The source of the difference is an inherent problem with the replay of concurrent
connections, the *misclassification of the replayed concurrent connections*.
While *tmix* always replays a concurrent connection vector in the right way
(*i.e.*, decoupling the two directions), the actual set of segments observed at the
monitor may simply not have any pair of data segments that satisfy the concurrency
test given in Section 3.3.3.
In other words, the segments of a replayed concurrent connection may exhibit a
fortuitous sequential ordering.
As a consequence, the data analysis algorithm classifies as sequential some connections
from the replay that were concurrent in the original trace. The sizes of the b-type ADUs
in these misclassified connections are then absent from the distribution for replayed
concurrent connections.
The small difference in the plot between the original and replayed distributions
demonstrates that the number of misclassifications is relatively small, so the majority
of the concurrent connections still exhibit concurrent behavior in the replays.

It is important to note that the probability of a misclassification decreases as the sizes of the ADUs increase, since the larger number of data segments makes finding a concurrent pair more likely. Therefore, misclassifications become less significant for the tails of the distributions, since the connections whose samples are in the tail have necessarily at least one large ADU (the one we see in the tail), and are less likely to be misclassified. There is no appreciable difference between the tails of the distributions from concurrent connections, in agreement with our observation regarding the lower likelihood of misclassification for connections with large ADUs. Misclassified connections are described using the sequential a-b-t model, so they result in additional samples for the distributions that characterize sequential connection vectors. These extra samples have a much smaller effect on the CDFs, since the number of samples from sequential connections is far larger anyway.

Figure 5.6 considers the distribution of the number of epochs extracted from the original and from the generated packet header traces. The distributions from the replays are very similar to the original one. The small difference comes again from the small number of misclassified concurrent connections that were considered sequential. Misclassified connections add extra samples to which slightly distort the distributions from the replays. There is a somewhat bigger difference in the far tail, for connection vectors between 1250 and 1500 epochs. This difference could be explained by misclassification and by trace boundaries (connections replayed more slowly than in the original that do not replay all of their epochs). We observe no difference between lossless and lossy replays in this part of the tail.

The next pair of plots, shown in Figure 5.7, examines the
distribution of the quiet times on the acceptor side of TCP connections,
*i.e.*, between and .
The plot of the bodies shows a very good match between the original
distribution and the ones measured from the replays of sequential connections.
The slightly heavier distributions from the replays is due to a small simplification
we made regarding the replay of quiet times. *Tmix* will replay the exact
quiet times specified in each connection vector. However, as discussed in
Section 3.3.1, when these quiet times are extracted from
a packet header trace, the measured quiet time is the sum of two components. The
first component comes from the quiet time at the end host,
and the second component comes from the delay between the monitor and
the endpoint.
When *tmix* replays a quiet time, it remains quiet for the exact duration of the
sum of these components, . Given that the replay in the testbed uses *usernet* to
reproduce the measured round-trip time of each connection,
there is also a delay between *tmix* end hosts and monitor,
so the analysis of the generated packet header trace results in quiet times of the form
. It would have been possible to eliminate this inaccuracy by
subtracting from the originally measured quiet times. The value of is
equal to half of the one-side transit time, although delayed acknowledgments and queuing
can affect individual samples. We did not try to incorporate a correction for this
*quiet time overestimation problem* in our experiments.
Besides measurement difficulties, the extra delay becomes less significant
in larger quiet times, for which is far smaller than . Larger quiet times
are far more significant, since they are the ones that
can increase the duration of TCP connections substantially.

There is also a good agreement in the tails of the distributions, although the distributions from the replays are slightly heavier than the original distributions. This is not explained by the previous overestimation of quiet times due the location of the monitor, because the magnitude of the quiet times in the tail is far larger than the magnitude of . The source of this small mismatch is the misclassification of some concurrent connections. This is true for both the differences between the tails from sequential connection vectors and between the tails from concurrent connection vectors. It may seem counter-intuitive that the misclassifications makes both types of tails heavier, instead of making one type of tail heavier and the other one lighter. The explanation is that misclassifications move samples from concurrent connections to sequential connections. These moved samples satisfy at the same time the following two properties:

- They have a lighter tail than the tail of the samples left in connections correctly classified as concurrent in the analysis of the generated traffic. The removal of these samples therefore makes the shown distributions from concurrent connections in the replays heavier than the one in the original trace.
- They have a heavier tail than the tail of the samples that they joined in connections correctly classified as sequential in the analysis of the generated traffic. The addition of these samples therefore makes the shown distributions from the sequential connections in the replays heavier than the one in the original trace.

The distribution of quiet times on the initiator side of TCP connections,
*i.e.*, between and , is compared for original and replayed traces in
Figure 5.8.
The bodies of the distributions show the
same kind of mismatch that we discussed for the distributions.
For values below a few seconds the distribution from the replay of sequential
connections appears heavier that the original distribution.
This is due to the overestimation of quiet times, which becomes less significant as the
quiet time becomes larger. We can also observe that the difference in the shortest
quiet time is larger for than for . The reason is not completely clear,
but it is probably related to the absence of samples in from the large subset of
connection vectors with only one epoch.
The distribution from the replay of concurrent connections
appears lighter than the original for values above one second. This is due to concurrent
connection misclassification. The much larger number of samples in the distributions
for sequential connections makes the impact of the misclassification very small.

Besides the replay of the source-level characteristics of the connections in Leipzig-II, our experiments also involved replaying the network-level parameters measured for each connection in . The left plot in Figure 5.9 compares the distributions of round-trip times extracted from the original and the generated packet header traces. The reproduction was very accurate for sequential connection vectors, and the three distributions exactly lie on top of each other. On the contrary, the distributions for the replayed concurrent connections show a strange jump in probability at 100 milliseconds. The reason for this anomaly, which changed the shape of the rest of the distribution, is unclear.

The right plot of Figure 5.9 compares the distributions of receiver window sizes. Note that the probability was computed over the total number of data bytes transferred, to give a better sense of the amount of data associated with each receiver window size. There is an excellent match between the distribution obtained from the Leipzig-II trace and those from its two source-level replays.

The final comparison examines the distributions of loss rates.
The left plot of Figure 5.10 shows the distribution of the
measured loss rates for the original Leipzig-II trace and its replays.
There is a reasonable match between the original and the lossy replays, especially for
sequential connections. This is a good result given that *usernet* creates losses
by generating random numbers in an independent manner. The small difference is probably explained
by a sample size problem in short connections with non-zero loss rates, as discussed
in Section 4.1.3, and by concurrent connection
misclassification.

Note that we measured some non-zero loss rates in the lossless experiment, in which
no artificial losses were introduced. This suggests some problem with the
experimental environment, perhaps some network interfaces that were duplicating
segments. Such duplicates confuse the loss rate measurement algorithm, which considers each
retransmission a loss event^{5.3}.
If duplication is behind our observations, the impact on the experiments would be minimal.
True loss slows down TCP, but duplication does not.

The right plot of Figure 5.10 shows the distributions of loss rates per byte, rather than per connection as in the left plot. The CDFs show the probability that each byte had of being carried in a connection with at most the given loss rate. For example, the CDFs for the original sequential connections shows that 80% of the bytes were carried in connections with a loss rate of 1% or less. The CDFs in the right plots are easier to read than those in the left plot, since they are far smoother. They are also more significant, since they pay more attention to the connections that carry more bytes, which are those than have a larger impact on the load of the network. There is a good match between loss rate distributions for the original and the lossy replay. Both the distribution from the replayed sequential connections and the one from replayed concurrent connections are slightly heavier than those from the original traces.

In general, we always observe heavier loss rates in the replays than in the original
data. The explanation is the dropping of pure acknowledgment packets, which was
discussed in Section 4.1.3. The analysis of the original trace considers
only the loss rate of data segments, and not the combined loss rate of data and
acknowledgment segments. However, the artificial dropping mechanisms in *usernet* that
is used to create per-flow losses is applied to all of the packets in the
connections. This means that both data segments and acknowledgment segments are dropped
according to the original loss rates of data segments. The dropping of acknowledgment
segments can increase the loss rate of data segments in the replay, because missing
acknowledgments can trigger unnecessary retransmissions.
Every retransmission is considered a loss event, and therefore we have an increase
of loss rate in the replays, which makes the measured distributions of (data segment) loss rates
heavier for the replays than for the original. It is certainly possible to modify
*usernet* to apply the dropping rate to data segments only, but our experiments
did not incorporate this refinement. It is somewhat unrealistic to use a biased
dropping mechanism, so it would be better to refine the data acquisition algorithm to
consider both data and pure acknowledgment losses. Measuring pure acknowledgment loss
rates is far more difficult that measuring data segment loss rates. Endpoints may
acknowledge every data segment, or every other data segment, and they do so
using cumulative acknowledgment numbers, rather than individual sequence numbers as
it is done for data segments. It is therefore more difficult to determine when
an acknowledgment does not arrive as expected.

The second trace considered in our validation of the source-level trace replay approach
is the UNC 1 PM trace.
This trace is shorter than Leipzig-II (1 hour *vs.* 2 hours and 45 minutes)
but it has much higher throughput, which results in a substantially larger number of
samples in the distributions that we will examine in this section.
Figure 5.11 compares the distributions extracted
from the UNC 1 PM and its lossless and lossy replays.
The bodies of the distributions from sequential connections reveal no difference
between original and generated traces. The tail of the distribution from the
lossy replay is slightly lighter than the one from the original trace and the one
from the lossless replay. This difference can be attributed to trace boundaries.
Losses make the replay of some connections slower, which can easily result in some
connections that do not have time to finish during the replay experiment. This effect is
more important for the largest data units, those in the tail of the distribution,
since they are the ones that require a substantial amount of time to complete their
transmission even without losses.

Concurrent connections show a slightly worse match. This is due to the misclassification problem described in the previous section. As pointed out before, misclassifications are more likely to occur in concurrent connections with small ADUs. These connections have a small number of packets, making the observation of concurrent pairs less likely. As a result, the bodies of the distributions from the replays are slightly heavier, since some fraction of the small ADUs disappeared from the distribution for concurrent connections. On the contrary, misclassifications had no visible impact on the distribution for sequential connections. This is because the number of a-type ADUs in sequential connection vectors is much larger than the number of samples from misclassified connections. The tails of the distributions for concurrent connections show a good agreement.

The distributions from the original UNC 1 PM traces and its replays are
even closer, as Figure 5.12 shows.
We can barely see any differences in bodies of the distributions from concurrent
connections and no difference for those from sequential connections.
The tails are also very similar, and the slight differences can be explained
using the same arguments put forward in the discussion of the distributions
(*i.e.*, trace boundaries and misclassifications).

Figure 5.13 shows an excellent match between the number of epochs in sequential connection vectors measured from the UNC 1 PM traces, and those measured from the replays. The bodies of the distributions are identical, and the tails show only a very minor difference. We therefore observe a better agreement between original and replay for UNC 1 PM than for Leipzig-II (see Figure 5.6).

The plots in Figure 5.14 study the distributions. The bodies for sequential connections show an excellent match between the inter-ADU quiet time measured from the original UNC 1 PM trace, and those measured from the generated traces. The bodies for concurrent connections are also very similar. The small difference for the smallest values requires further investigation. We should not see these samples here because our only method for detecting inter-ADU quiet times in concurrent connections is to identify periods of inactivity above 500 milliseconds. We do not observe such a difference for Leipzig-II and Abilene-I. The tails of the distributions are very similar for sequential and concurrent connections. As it was also the case in the data from Leipzig-II shown in Figure 5.7, we observe slightly heavier tails from the replays, which can be explained by misclassifications.

Figure 5.15 shows the bodies and the tails of the distributions. Data from sequential connections shows an excellent match for values above 1 second, and even the far tail is very closely approximated. For values below 1 second, we observe that the replays have heavier distributions. This is explained by the quiet time overestimation problem discussed in the analysis of the Leipzig-II results. Concurrent connections also show an excellent match between original and generated traces. The artifact in the smallest inter-ADU quiet times that was observed for the distributions from concurrent connections is also present in the distributions from concurrent connections.

The next four plots study how closely the replays of UNC 1 PM approximated the network-level parameters observed in the original plot. The left plot of Figure 5.16 shows the distributions of round-trip times. For sequential connections, there was no difference between the round-trip times obtained from the original trace and those obtained from its replays. For concurrent connections, there is only a very small difference, which we can attribute to concurrent connection misclassifications. The large masses of probability for 100 milliseconds observed in the Leipzig-II replays are not present in the UNC 1 PM replays.

Regarding the distribution of TCP receiver window sizes, the plot on the right in Figure 5.16 shows a good match between the original data and the one obtained from the analysis of the generated packet header traces. The tiny difference can again be explained by concurrent connection misclassifications, but it is clear that the replayed traffic accurately captured the use of TCP receiver window sizes.

Figure 5.17 studies the distributions of loss rates
rates obtained from original and replayed traffic. As indicated in the analysis of
the replays of Leipzig-II, matching loss rate is difficult given the use of independent
packet dropping in *usernet* . Consequently, we can consider the approximation of the
loss rates shown in the left plot of the figure reasonable, especially in the case
of sequential connections, for which many more samples were available.
In contrast to these per-connection loss rates, the right plot of the figure shows
a substantially closer approximation when loss rate per bytes are considered. Note
also that difference between distributions of loss rates for sequential and concurrent
connections is far smaller in the case of probabilities per byte.

We conclude the validation of our source-level trace replay method by comparing the original Abilene-I trace and its lossless and lossy replays. This is the trace with the highest average throughput. Figure 5.18 shows that the distributions measured from the replayed traces are very similar to those measured from the original trace. Given the completely different distributions for sequential and concurrent connections, we would expect that any substantial number of misclassified concurrent connections would result in distributions from the replays that significantly diverge from the original distributions. The excellent approximation in this figure, and for the distributions shown in Figure 5.18, suggest that the number of misclassifications was very small. We also observe a very good match for the tails where the only difference is found for the largest values. In some cases, the replay is slower than the original trace, so some of the largest ADUs may not have had enough time to complete. Adding losses to the replay experiment did not introduce any noticeable difference in the measured distributions, which confirms the robustness of the data acquisition and generation methods to the challenge of lossy environments.

The two plots in Figure 5.19 show that the original distribution of b-type ADU sizes is almost identical to the ones obtained from the lossless and lossy replays. This is true both for the bodies studied in the plot on the left, and for the tails studied in the plot on the right. It is quite difficult to find any region where the distributions differ. It is also clear that the addition of losses to the replay did not modify the sizes of the ADUs in the experiment.

Figure 5.20 shows that the bodies and the tails of the distributions of the numbers of epochs are closely approximated in the source-level replays. There is only a very slight difference in the far tail of the distributions. This could be attributed to a few connections that were replayed more slowly than in the original trace, so they did not have time to complete all of their epochs. Another possibility is that a small number of concurrent connections with a large number of epochs were misclassified. The probabilities in the tail are so small, that even a few samples can create a visible difference.

The quality of the replay of quiet times between ADUs is studied in the next two figures. Figure 5.21 shows that the distributions are accurately approximated in the replays. This is true both for sequential and concurrent connections. We only observed a small difference in the far tail, where the replays show slightly heavier values for quiet times above 1000 seconds. As in the case of the distributions, both experiment boundaries and concurrent connection misclassification can explain the difference.

Figure 5.22 examines the distribution of quiet times on the initiator side. As shown on the left, there is an excellent match between the bodies of the distributions from the original trace and those from the replays. The only difference is found in the distributions from sequential connections for quiet times below 1 second. The quiet times measured from the replays became increasingly heavier than those from the original trace as their magnitude decreased. This finding is consistent with inaccuracies due to the overestimation of quiet times, since end-host location has a larger impact on the measured quiet time as the magnitude of the application-level quiet time decreases. The tails of the distributions reveal an excellent approximation. It is also important to note that the distributions for concurrent connections do not show the unexpected values below 500 milliseconds that were observed for UNC 1 PM.

The analysis of the round-trip times in Figure 5.23 reveals an excellent match between the original and the replay distributions of round-trip times. The replay of concurrent connections exhibits the same artifact at 100 milliseconds encountered in the replays of the Leipzig-II trace, but the magnitude is far smaller. The distributions of receiver window sizes show very close approximations, with only a small divergence for concurrent connections, which can be easily explained by a small number of misclassifications.

The distribution of loss rates in the lossy replay is very close to the original distribution, as shown in Figure 5.24. The CDFs on the left plot show cumulative probabilities computed per connection, and they reveal a remarkably good match between the original and the lossy replay, both for concurrent and sequential connections. This is significantly better than in the cases of Leipzig-II and UNC 1 PM, which were studied in Figures 5.10 and 5.17. The better match is mostly explained by two characteristics of the original data. First, Abilene-I has the largest fraction of lossy connections, which more than doubles the one in Leipzig-II. This means a wider y-axis that reduces the distance between the distributions in the plot. Second, the heavier distribution of connection sizes in the Abilene-I trace means a larger number of packets, which makes the use of independent drops approximate the intended loss rates more accurately. The right plot shows a good match when the distributions of the per-byte loss rates are considered.

Doctoral Dissertation: *Generation and Validation of Empirically-Derived TCP Application Workloads*

© 2006 Félix Hernández-Campos