- ... model1.1
- Our a-b-t
model provides however a good foundation for developing mathematical and statistical
models of traffic at the source-level. This dissertation consistently follows
a non-parametric approach to traffic modeling. The only exception is the Poisson Resampling
method presented in Chapter 7, for which we also offer
a more powerful non-parametric alternative, block resampling.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... traffic2.1
- To be more specific, Arlitt and Williamson proposed a model of
``Mosaic'' traffic. Mosaic was the first web browser.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... layer2.2
- This is also
possible in the original implementation, using one firewall rule
for each flow, but it does not scale to the hundreds of simultaneous
flows in our experiments.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... time3.1
- Similarly, if resources are
allocated along the connection's path, they must be committed for a longer
period.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
milliseconds3.2
- Some infrequent events, such as routing changes due to
link failures, can last several seconds. We generally model large numbers of
TCP connections, so the few occasions in which we confuse application quiet
times with long network quiet times have no measurable statistical impact
when generating network traffic.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... connection3.3
- In general,
persistent HTTP connections are closed by web
servers after a maximum number of request/response exchanges (epochs)
is reached or a maximum quiet time threshold is exceeded.
By default, Apache, the most popular web server,
limits the number of epochs to 5 and the maximum quiet time to 15 seconds.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... segments3.4
- This assumes the
standard maximum segment size, 1,460 bytes, and a maximum receiver window
of at least 10 full size segments. A large fraction of TCP connections observed
on real networks satisfy these assumptions.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... operations3.5
- HTTP
server push is implemented using a special content type,
x-mixed-replace,
which makes the browser expect a response object that is composed of
other objects (separated by a configurable boundary string). Since no
limit is imposed on the number of objects in this composite,
webcam movies are usually implemented as a simple sequence of JPEG images
that the web browser reads and renders continuously until the user moves to
another page.
This type of web service should not be confused with HTML's automatic page
refresh tag, which is commonly used for slow rate webcams
(e.g., one image every 30 seconds). In this case, the browser
refreshes the current page by downloading again the current page and hence
the interaction follows the regular request/response pattern.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... session3.6
- This is
an abbreviated version of the original session, in which there was some directory
navigation and more directory listings. The control connection used
port 21, while the data connections used dynamically selected port numbers. Note also that
significant inter-ADU times due to user think time are not shown in
the diagram.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... direction3.7
- This is true as long as the
connection carries 4 GB or less. Otherwise, sequence numbers are repeated due
to the wraparound of their 32-bit representation. We discuss how to
address this difficulty at the end of Section 3.3.3.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
substantially3.8
- The well-known tcptrace tool [Ost],
provides a good example of the difficulty of efficiently implementing this technique.
tcptrace can analyze multiple connections at the same time, by keeping
separate state for each connection, and making use of hashing to quickly locate
the state corresponding to the connection to which a new segment belongs.
When this tool is used with our traces, we quickly run out of memory on our processing
machines (which have 1.5 GB of RAM).
This occurs even when we use tcptrace's real-time processing mode, which is
supposed to be highly optimized.
We believe it is possible to perform our analysis without the sorting step, but it is
certainly much more difficult to develop a memory-efficient implementation.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... algorithm3.9
- We have not addressed the
extra complexity that TCP window scaling for Long-Fat-Networks (RFC 1323
[JBB92]) introduces. It is often the case that TCP options are not
available in the traces, so the use of window scaling and TCP timestamps
has to be inferred from the standard TCP header.
This is a daunting task. If the options are available, it is straightforward
to combine regular sequence numbers and timestamps to handle this case.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... connections3.10
- It is very
unlikely that any of these connections was measured as unidirectional due
to measurement losses. The traces studied in this section were collected
using a high-performance monitoring device, a DAG card [Pro], that did not
report any losses during data acquisition.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... scanning3.11
- Network scanning is a technique for discovering
the hosts attached to a network by probing each possible IP address in a network domain.
The basic technique is to send a packet which generally requires a response from the
host that received it (e.g., an ICMP echo request, a TCP SYN segment).
Malicious users often scan remote networks to
find hosts before trying to break into them. Network scanning with TCP segments
is available in many popular tools, e.g., nmap.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... scanning3.12
- Port scanning
is similar to network scanning, but it involves probing a range of port numbers
(for a single IP address) rather than probing a range of IP addresses. The goal of port scanning is
to discover active services, which could potentially have vulnerabilities. Port scanning
is performed using any TCP segment (or UDP datagram) that elicits a response from the victim
(e.g., a SYN segment requires a SYN-ACK in response, a malformed segment requires a RST segment
in response).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... connections3.13
- We found numerous connections
that had data segments with increasing sequence numbers.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... segment3.14
- In some
cases, these connections showed some data segments with a sequence
number above that of the FIN segments. These cases seemed to be caused by
TCP implementation errors.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... listening3.15
- In this case the destination endpoint
responds with a TCP reset segment, and no application-level communication takes
place.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... connection3.16
- In
some (rare) cases, we may miss some segments before connection establishment
(e.g., we miss the first SYN segment but observe its retransmission), or
we may miss some segments after connection establishment
(e.g., we miss the retransmission of the final FIN segment and its
acknowledgment).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... distributions3.17
- The tail of a
Pareto distribution is always linear in
a CCDF, and the tail of a Lognormal distribution can be linear for an arbitrary
number of orders of magnitude.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... throughput4.1
- More precisely,
throughput is the rate of transfer taking into account not only application
data but also control headers added by TCP and lower network layers.
A related concept, goodput, is the rate of transfer of application data, i.e., TCP payload.
This distinction is important, but in the discussion above, throughput and
goodput are affected similarly by round-trip times, so we simply talk about
throughput.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... segment4.2
- For simplicity,
our illustrations use acknowledgment numbers that refer to the cumulative sequence number accepted by the endpoint,
which is one unit below the actual acknowledgment number stored in the TCP header [Pos81].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... immediately4.3
- Endpoints are not required to behave in this
manner by any RFC, but it makes little sense to delay the acknowledging of SYN segments.
On the contrary, delaying the acknowledging of data segments gives the endpoints a
chance to receive a second data segment and acknowledge both data segments using a
single acknowledgment.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... milliseconds4.4
- Typical values
are between 100-200 milliseconds.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... alone4.5
- Some implementations send an ACK whenever an out-of-order
data segment is received, like Segment 2 in this case, but this behavior is not mandated
by Internet standards. RFC 2581 [APS99] only recommends it.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
segments4.6
- The advertised receiver window size is given in bytes in the
TCP header. We describe it here and in section 4.1.1 in terms of segments for
convenience when considering the impact of round-trip times.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... bytes4.7
- Some implementations support the window scaling option described
in [JBB92], which enables larger windows. These larger windows are specified
as the product of the receiver window size encoded in a 16-bit field in the TCP header,
and a multiplier encoded in a TCP option (almost always 64 KB).
We have not studied this feature in our work.
The use of the window scaling is negotiated by the endpoints using a TCP header option, and
TCP options are often not included in segment header traces, making the analysis
difficult. It would however be possible to study the maximum amount of unacknowledged data
in each connection, which would allow us to identify violations of the advertised window.
For these cases, we could estimate the scaled window size by multiplying the advertised window
by 64 KB.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
packets4.8
- In this section, we will often use the term packet rather
than segment. In the context of TCP traffic, a time series of packets per
unit time and a time series of segments per unit time are the same
thing. However, the traffic measurement literature generally talks about
packet throughput (not segment throughput), often using the
unit .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... payload4.9
- A connection without any useful TCP payloads has
an empty connection vector since no ADU is sent.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... place4.10
- The ``no payload''
time series would have been much more significant if, for example,
a denial-of-service attack using SYN segments had taken place.
These segments, and the likely SYN-ACK segments
sent in response by the victim, would have not carried any (useful) payloads
(no application-level communication would have taken place), and would have been classified
as ``no payload'' traffic.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... scanning4.11
- This type of activity creates unidirectional traffic
whenever the target host is firewalled, or otherwise unreachable,
or the target IP does not exist. The location of these spikes at the end of the trace
is purely accidental.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... offline4.12
- In this case,
clients would try to open a connection by sending a SYN segment (and
several retransmissions), which will receive no response since the
destination server is not online. These types of connection attempts to
offline hosts show up as unidirectional connections in segment header traces.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... Dependence4.13
- Long-range
dependence is sometimes referred to as long memory.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... works4.14
- The rationale for choosing the term
``logscale diagram'' can be confusing, since it is applicable to
any kind of plot in which one or more axes show a logarithmic transformation of the data.
The term ``wavelet spectrum'' is more specific and seems more appropriate and has become
the standard in the literature.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... trace4.15
- The only exception is the Abilene-I trace
in the direction from Cleveland to Indianapolis, where routing asymmetries are responsible for most
of the burstiness.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... bin4.16
- Note that the simulated time series had exactly this mean,
but the number of packets in each bin was always an integer number.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... problems4.17
- For example, some implementations
send several reset segment after a lossy connection termination, and these segments
are often separated by long period of inactivity.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... connection5.1
- We thank the members of the FreeBSD project in general, and in particular
the creator of dummynet , Luigi Rizzo, for their outstanding work. Our empirical work would
not have been possible without their generous efforts.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
segments5.2
- Fragmentation takes place below the usernet layer.
Figure 5.3 can be confusing in this regard, since fragmentation
does take place at the IP layer. Usernet is actually embedded in the IP layer.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... event5.3
- This approach could certainly be refined using the
IP ID field to distinguish duplications from retransmissions.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... bursty6.1
- The term
bursty does not have a unique meaning. In this paragraph, it simply
refers to high variability. Some authors consider traffic more bursty as its long-range
dependence becomes stronger [WP98], while others as its marginal distribution
becomes less Gaussian [SRB01]. We make use of these more formal
definitions, discussed in Chapter 4, in later sections.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... measurements6.2
- MSS is a system-wide constant in
FreeBSD, so generating traffic that preserves per-connection MSS is not directly possible with
our current implementation. However, there is a relatively simple way to extend our
method to support per-connection MSS values. We could use a first step to group connections with the
same MSS and then assign each group to a host configured with that MSS.
Fortunately, only a few MSS values are common on the Internet, so it seems feasible to implement
this extension without increasing the number of hosts.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... parameters6.3
- More specifically,
we learned that the range of the distribution of round-trip times determines the
knee of the spectrum, while the distribution of window size determines the level of
energy at the finest scales. Related results from web traffic simulations can be found in [FGHW99].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
started7.1
- This duration is always slightly below the true
duration of the original packet header trace, since at least the packets of the last
connection started are observed after its start time.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... distribution7.2
- The shown
exponential distribution comes from randomly sampling the theoretical
distribution times.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
bodies7.3
- The tails were not studied in that paper.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
point7.4
- Besides problems
with the fitting or the data acquisition in the paper, we conjecture that this could
be due to the slightly different type of data we considered in our study.
Our connections were fully captured and included only those connections that
actually carried data.
Those in [Fel00] included
degenerate cases in which no data was transferred.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... load7.5
- Note that the duration of
comes from random samples of an exponential
distribution, so it can be slightly lower or higher that the intended . Given the light
tail of the exponential distribution and the large
number of samples, this deviation is necessarily very small.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... structure7.6
- We
thank Peter Hall for suggesting the use of block bootstrapping in the context
of the a-b-t model. The theoretical aspect of this idea are explored in [HNHC02],
while this chapter focuses on its use to preserve the long-range dependence in
connection arrivals and develops thinning and thickening methods to scale offered load in
block-resampled traces.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... components8.1
- Also known
as the five pillars of Abtism. [sic]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.