I have submitted a paper to MASCOTS 2013 describing the results of my Web Traffic Evolution study.
Abstract
Over the last decade web content has evolved from relatively static pages often delivered by one or two servers, to websites rich with interactive media content served from numerous servers. This content change has affected the associated network traffic. Quantifying and analyzing these changes can lead to updated traffic models and more accurate web traffic simulations for testing new protocols and devices. In this work we analyze the TCP/IP headers in packet traces collected at various times over 13 years on the link that connects the University of North Carolina at Chapel Hill (UNC) to its ISP. We show that while the decade-old methodology for inferring web activity from these packet traces is still viable, it is no longer possible to infer all page boundaries given only the TCP and IP headers. We propose a novel method for segmenting web traffic into Activity Sections, in order to obtain comparable higher level statistics. Using these methods to analyze our data set, we describe trends in the HTTP request and response sizes, and a trend towards longer connection durations. We also show that the number of servers supporting web activity has increased, and present empirical evidence that suggests the number of unused connections has risen, likely due to new speculative TCP preconnect features of popular browsers.
Related Presentation
Color Plots
Figure 1.
A summary of the data set.
Figure 3.
CDF of request data sizes up to 5000 bytes.
Figure 4.
CCDF of request data sizes.
Figure 5.
CDF of response data sizes from 400 to 10,000 bytes.
Figure 6.
CDF of connection durations.
Figure 7.
CDF of the number of requests per connection.
Figure 8.
CDF of the total number of request bytes per connection.
Figure 9.
CDF of the total number of response bytes per connection.
Figure 10.
SYN-ACKs received by browsing to landing pages.
Figure 11.
Results of calibration experiment to determine Gap Length Threshold (L).
Figure 12.
Total Response Bytes per Device.
Figure 13.
CDF of SYN-ACKs per Activity Section.
Figure 14.
CDF of the number of servers per Activity Section.
Detailed research log is here (only internally accessible)
All work Copyright Ben Newton 2013