As long as the network setup of a simulation or testbed experiment remains unchanged, the source-level trace replay of a connection vector trace always results in traffic that is similar to the original trace. Every replay contains the same number of TCP connections behaving according to the same connection vector specification and starting at the same times. Only tiny variations are introduced on the end-systems by changes in clock synchronization, operating system scheduling and interrupt handling, and at switches and routers by the stochastic nature of packet multiplexing. Source-level trace replay has therefore two desirable properties:
In this dissertation, we propose two flexible methods for introducing variability in traffic generation experiments. In both cases, the set of connection vectors in is randomly resampled, resulting in a new set that preserves the aggregate source-level characteristics of the original traffic. In our first method, Poisson Resampling, we construct a new connection vector trace by randomly resampling connections from , and assigning them exponentially distributed inter-arrival times. As a result, connections in arrive according to a Poisson process. In the second method, Block Resampling, we resample blocks (groups) of connections rather than individual connections. This method results in a more realistic connection arrival process, which matches the substantial burstiness observed in real traces. In more technical terms, Block Resampling preserves the moderate long-range dependence found in real connection arrival processes, while Poisson Resampling results in a short-range dependent connection arrivals process. This difference is demonstrated in our experimental evaluation of the two methods. In addition, the evaluation shows that the duration of the resampling block creates a trade-off between shorter blocks (which increase the number of distinct resamplings) and long-range dependence (which disappears for short blocks). Our analysis demonstrates that block durations between 1 and 5 minutes offer the best compromise.
Researchers often need to conduct a set of experiments with a range of different traffic loads. When using a traditional source-level model, e.g., a model of web traffic, researchers have to first conduct a preliminary experimental study to determine how the parameters of the model, e.g., the number of user equivalents, affect the generated load [CJOS00,LAJS03,KcLH$^+$02]. This is usually known as the calibration of traffic generator. Our resampling methods eliminate this common need for calibrating traffic generators, since the resampling process can be controlled to match a specific target load (i.e., generated load is known a priori). In the case of Poisson Resampling, this is accomplished by changing the mean arrival rate of connections. In the case of Block Resampling, offered load is manipulated using block thinning (i.e., subsampling) and block thickening (i.e., combining blocks). Our work reveals that load scaling cannot be based simply on controlling the number of connections. Such an approach frequently results in offered loads that are far from the target, because the number of connections in a resample is not strongly correlated with the offered load represented by these connections. We address this difficulty by developing byte-driven versions of Poisson Resampling and Block Resampling, which scale load using a running count of the total data in the resampled trace . Unlike the number of connections, the total amount of data in is strongly correlated to traffic load offered by . Our experiments confirm that byte-driven resampling is highly accurate, eliminating the common need for calibrating traffic generators.
Doctoral Dissertation: Generation and Validation of Empirically-Derived TCP Application Workloads
© 2006 Félix Hernández-Campos