ABSTRACT: Understanding the nature and structure of web traffic is essential for valid simulations of networking technologies that affect the end-to-end performance of HTTP connections. We provide data suitable for the construction of synthetic web traffic generators and in doing so retrospectively examine the evolution of web traffic. We use a simple and efficient analysis methodology based on the examination of only the TCP/IP headers of one-half (server-to-client) of the HTTP connection. We show the impact of HTTP protocol improvements such as persistent connections as well as modern content structure that reflect the influences of "banner ads," server load balancing, and content distribution networks. Lastly, we comment on methodological issues related to the acquisition of HTTP data suitable for performing these analyses, including the effects of trace duration and trace boundaries.
(Copies of the empirical distributions of HTTP parameters reported in the paper are available here.)