Generation and Validation of Synthetic Internet Traffic
Principal Investigator: Kevin Jeffay, Don Smith
Funding Agency: National Science Foundation
Agency Number: ANI-0323648
Abstract
Networking research has long relied on simulation as the primary vehicle for demonstrating the effective-ness of proposed algorithms and mechanisms. Typically one constructs either a laboratory network test-bed and conducts experiments with actual network hardware and software or one simulates network hardware and software in software. In either case experimentation proceeds by simulating the use of the (real or simulated) network by a given population of users using applications such as ftp or web browsers. Synthetic traffic generators are used to inject traffic into the network according to a model of how the cor-responding application or user behaves. This paradigm of simulation follows the philosophy of using source-level descriptions of network traffic advocated by Floyd and Paxson [17]. This approach is pre-ferred over the use of packet-level descriptions of network traffic as in the case of TCP-based applica-tions, TCP's end-to-end congestion control (perhaps influenced by router-based mechanisms such as RED packet drops or ECN markings) shapes the low- level packet-by-packet traffic processes. Thus for any ap-plication using a transport protocol that reacts to congestion indicators, the generation of network traffic must be accomplished by using application-dependent but network- independent traffic sources layered over (real or simulated) protocol implementations. Thus a critical problem in network simulations, is the problem of generating application-dependent, network-independent synthetic traffic that corresponds to a valid, contemporary model of application or user behavior. Our thesis for this project is that the networking community suffers from a lack of contemporary models of network traffic. To yield valid results, network simulators and testbeds require application-dependent, network-independent synthetic traffic generators that corresponds to a valid, contemporary models of ap-plication or user behavior. However, the Internet is evolving far more rapidly than our ability to under-stand the mix and use of applications that account for the majority of bytes transferred on the Internet. Just as the ns simulator today includes an automatic network graph topology generator, we envision a tool-suite that will allow researchers to automatically generate traffic mixes that are statistically equiva-lent to traffic observed in actual networks and that the process of going from measurements to models and traffic generators can be reduced from years to days. We will investigate the use of statistical cluster analysis to characterize the essence of TCP-based applications and develop tools to automate the trace processing and model generation. In this research we propose to address this problem by creating a framework of application class models where each class is distinguished by a common paradigm of how it is a source of network traffic. For each such class, based on empirical measurements alone (and in particular with no knowledge of what the ac-tual application(s) generating the traffic are), we can create a stochastic model that reflects the aggregated characteristics of all TCP-based applications that are members of that class. We propose to develop statis- tical cluster analysis techniques to identify a set of such classes based on traces from several Internet links. Given this set of class-based models and a desired proportion of each class, we can generate the source-level synthetic traffic generators for network experiments with an arbitrary mix of applications.

