next up previous contents
Next: Thesis Statement Up: Introduction Previous: Source-Level Trace Replay   Contents

Trace Resampling and Load Scaling

As long as the network setup of a simulation or testbed experiment remains unchanged, the source-level trace replay of a connection vector trace \(\mathcal{T}_c=\{(T_i, C_i)\}\) always results in traffic that is similar to the original trace. Every replay contains the same number of TCP connections behaving according to the same connection vector specification and starting at the same times. Only tiny variations are introduced on the end-systems by changes in clock synchronization, operating system scheduling and interrupt handling, and at switches and routers by the stochastic nature of packet multiplexing. Source-level trace replay has therefore two desirable properties:

While these properties are important, the practice of experimental networking often requires to introduce controlled variability in the generated traffic for exploring a wider range of scenarios. This motivates the development of methods that manipulate \(\mathcal{T}_c\) in order to generate different traffic that still resembles the original one. Furthermore, developing a statistically sound way of manipulating \(\mathcal{T}_c\) is essential for generating traffic with different levels of offered load. This manipulation to match a target offered load is a very common need in experimental networking research. This is because the performance of a network mechanism or protocol is often affected by the amount of traffic to which it is exposed. Therefore, rigorous experimental studies frequently require to generate a complete range of target loads.

In this dissertation, we propose two flexible methods for introducing variability in traffic generation experiments. In both cases, the set of connection vectors in \(\mathcal{T}_c\) is randomly resampled, resulting in a new set \(\mathcal{T}_c^\prime \) that preserves the aggregate source-level characteristics of the original traffic. In our first method, Poisson Resampling, we construct a new connection vector trace \(\mathcal{T}_c^\prime \) by randomly resampling connections from \(\mathcal{T}_c\), and assigning them exponentially distributed inter-arrival times. As a result, connections in \(\mathcal{T}_c^\prime \) arrive according to a Poisson process. In the second method, Block Resampling, we resample blocks (groups) of connections rather than individual connections. This method results in a more realistic connection arrival process, which matches the substantial burstiness observed in real traces. In more technical terms, Block Resampling preserves the moderate long-range dependence found in real connection arrival processes, while Poisson Resampling results in a short-range dependent connection arrivals process. This difference is demonstrated in our experimental evaluation of the two methods. In addition, the evaluation shows that the duration of the resampling block creates a trade-off between shorter blocks (which increase the number of distinct resamplings) and long-range dependence (which disappears for short blocks). Our analysis demonstrates that block durations between 1 and 5 minutes offer the best compromise.

Researchers often need to conduct a set of experiments with a range of different traffic loads. When using a traditional source-level model, e.g., a model of web traffic, researchers have to first conduct a preliminary experimental study to determine how the parameters of the model, e.g., the number of user equivalents, affect the generated load [CJOS00,LAJS03,KcLH$^+$02]. This is usually known as the calibration of traffic generator. Our resampling methods eliminate this common need for calibrating traffic generators, since the resampling process can be controlled to match a specific target load (i.e., generated load is known a priori). In the case of Poisson Resampling, this is accomplished by changing the mean arrival rate of connections. In the case of Block Resampling, offered load is manipulated using block thinning (i.e., subsampling) and block thickening (i.e., combining blocks). Our work reveals that load scaling cannot be based simply on controlling the number of connections. Such an approach frequently results in offered loads that are far from the target, because the number of connections in a resample is not strongly correlated with the offered load represented by these connections. We address this difficulty by developing byte-driven versions of Poisson Resampling and Block Resampling, which scale load using a running count of the total data in the resampled trace \(\mathcal{T}_c^\prime \). Unlike the number of connections, the total amount of data in \(\mathcal{T}_c^\prime \) is strongly correlated to traffic load offered by \(\mathcal{T}_c^\prime \). Our experiments confirm that byte-driven resampling is highly accurate, eliminating the common need for calibrating traffic generators.


next up previous contents
Next: Thesis Statement Up: Introduction Previous: Source-Level Trace Replay   Contents

Doctoral Dissertation: Generation and Validation of Empirically-Derived TCP Application Workloads
© 2006 Félix Hernández-Campos