Internet Traffic Evolution

I have submitted a paper to MASCOTS 2013 describing the results of my Web Traffic Evolution study.

Abstract

Over the last decade web content has evolved from relatively static pages often delivered by one or two servers, to websites rich with interactive media content served from numerous servers. This content change has affected the associated network traffic. Quantifying and analyzing these changes can lead to updated traffic models and more accurate web traffic simulations for testing new protocols and devices. In this work we analyze the TCP/IP headers in packet traces collected at various times over 13 years on the link that connects the University of North Carolina at Chapel Hill (UNC) to its ISP. We show that while the decade-old methodology for inferring web activity from these packet traces is still viable, it is no longer possible to infer all page boundaries given only the TCP and IP headers. We propose a novel method for segmenting web traffic into Activity Sections, in order to obtain comparable higher level statistics. Using these methods to analyze our data set, we describe trends in the HTTP request and response sizes, and a trend towards longer connection durations. We also show that the number of servers supporting web activity has increased, and present empirical evidence that suggests the number of unused connections has risen, likely due to new speculative TCP preconnect features of popular browsers.

Related Presentation

Color Plots

Figure 1.
A summary of the data set.

A summary of the data set

Figure 3.
CDF of request data sizes up to 5000 bytes. CDF of request data sizes up to 5000 bytes

Figure 4.
CCDF of request data sizes.

CCDF of request data sizes.

Figure 5.
CDF of response data sizes from 400 to 10,000 bytes. CDF of response data sizes from 400 to 10,000 bytes.

Figure 6.
CDF of connection durations.

CDF of connection durations.

Figure 7.
CDF of the number of requests per connection. CDF of the number of requests per connection.

Figure 8.
CDF of the total number of request bytes per connection. CDF of the total number of request bytes per connection

Figure 9.
CDF of the total number of response bytes per connection. CDF of the total number of response bytes per connection.

Figure 10.
SYN-ACKs received by browsing to landing pages. SYN-ACKs received by browsing to landing pages.

Figure 11.
Results of calibration experiment to determine Gap Length Threshold (L). Calibration experiment to determine Gap Length Threshold (L).

Figure 12.
Total Response Bytes per Device.

Total Response Bytes per Device

Figure 13.
CDF of SYN-ACKs per Activity Section.

CDF of SYN-ACKs per Activity Section

Figure 14.
CDF of the number of servers per Activity Section. CDF of the number of servers per Activity Section

Detailed research log is here (only internally accessible) 

All work Copyright Ben Newton 2013