What TCP/IP Protocol Headers Can Tell Us About the Web


F.D. Smith, F. Hernandez Campos, K. Jeffay, and D. Ott
ACM SIGMETRICS 2001/Performance 2001
Cambridge, MA, June 2001,
pages 245-256.

Abstract: We report the results of a large-scale empirical study of web traffic. Our study is based on over 500 GB of TCP/IP protocol-header traces collected in 1999 and 2000 (approximately one year apart) from the high-speed link connecting a large university to its Internet service provider. We also use a set of smaller traces from the NLANR repository taken at approximately the same times for comparison. The principal results from this study are: (1) empirical data suitable for constructing traffic generating models of contemporary web traffic, (2) new characterizations of TCP connection usage showing the effects of HTTP protocol improvement, notably persistent connections (e.g., about 50% of web objects are now transferred on persistent connections), and (3) new characterizations of web usage and content structure that reflect the influences of "banner ads," server load balancing, and content distribution. A novel aspect of this study is to demonstrate that a relatively light-weight methodology based on passive tracing of only TCP/IP headers and off-line analysis tools can provide timely, high quality data about web traffic. We hope this will encourage more researchers to undertake ongoing data collection programs and provide the research community with data about the rapidly evolving characteristics of web traffic.


Get a PostScript (compressed) - or - a PDF copy of this paper.
(Copies of the empirical distributions of HTTP parameters reported in the paper are available here.)
(A copy of the slides for the talk presented at the conference is also available.)



Back to the Networking Research at UNC page.