- From TCP Sequence Numbers to Application Data Units
- Logical Order of Data Segments
- Data Analysis Algorithm

Abstract Source-Level Measurement

The a-b-t model provides an intuitive way of describing source behavior in an application-neutral manner that is relevant for the performance of TCP. However, this would be of little use without a method for measuring real network traffic and casting TCP connections into the a-b-t model. We have developed an efficient algorithm that can convert an arbitrary trace of TCP/IP protocol headers into a set of connection vectors. The algorithm makes use of the wealth of information that segment headers provide to extract an accurate description of the abstract source-level behavior of the applications driving each TCP connection in the trace. It should be noted that this algorithm is a first solution to a complex inference problem in which we are trying to understand application behavior from the segment headers of a measured TCP connection without examining payloads, and hence without any knowledge of the identity of the application driving the connection. This implies ``reversing'' the effects of TCP and the network mechanisms that determine how ADUs are converted into the observed segments that carry the ADU. The presented algorithm is by no means the only one possible, or the most sophisticated one. However, we believe it is sufficiently accurate for our purpose, and we provide substantial experimental evidence in this and later chapters to support this claim.

From TCP Sequence Numbers to Application Data Units

The starting point of the algorithm is a trace of TCP segment headers,
, measured on some network link.
Our technique applies to TCP connections for which
both directions are measured (known as a *bidirectional* trace),
but we will also comment on the problem of extracting a-b-t connection
vectors from a trace with only one measured direction
(a *unidirectional* trace). While most public traces are bidirectional
(*e.g.*, those in the NLANR repository [nlaa]), unidirectional traces are sometimes
collected when resources (*e.g.*, disk space) are limited.
Furthermore, routing asymmetries often
result in connections that only traverse the measured link in one direction.

We will use Figure 3.10 to describe the basic technique for measuring ADU sizes and quiet time durations. The figure shows a set of TCP segments representing the exchange of data illustrated in the a-b-t diagram of Figure 3.1. After connection establishment (first three segments), a data segment is sent from the connection initiator to the connection acceptor. This data segment carries ADU , and its size is given by the difference between the end sequence number and the beginning sequence number assigned to the data (bytes 1 to 341). In response to this data segment, the other endpoint first sends a pure acknowledgment segment (with cumulative acknowledgment number 342), followed by two data segments (with the same acknowledgment numbers). This change in the directionality of the data transmission makes it possible to establish a boundary between the first data unit , which was transported using a single segment and had a size of 341 bytes, and the second data unit , which was transported using two segments and had a size of 2,555 bytes.

The trace of TCP segments must include a timestamp for each segment that reports the time at which the segment was observed at the monitoring device. Timestamps provide a way of estimating the duration of quiet times between ADUs. The duration of is given by the difference between the timestamp of the 4th segment (the last and only segment of ), and the timestamp of the 6th segment (the first segment of ). The duration of is given by the difference between the timestamp of the last data segment of (7th segment in the connection) and the timestamp of the first FIN segment (8th segment in the connection).

Note that the location of the monitoring point
between the two endpoints affects the measured duration
of and (but not the measured sizes of and ).
Measuring the duration of from the monitoring point 1 shown in
Figure 3.10 results in an estimated time
that is larger than the estimated time measured at monitoring point 2.
Inferring application-layer quiet time durations is always complicated
by this kind of measurement variability (among other causes), so short quiet times (with durations
up to a few hundred milliseconds) should not be taken into account. Fortunately,
the larger the quiet time duration, the less significant the measurement
variability becomes, and the more important the effect of the quiet time is on the
lifetime of the TCP connection. We can therefore choose to assign a value of zero to any
measured quiet time whose duration is below some threshold, *e.g.*, 1 second, or simply use the
measurement disregarding the minor impact of its inaccuracy.

If all connections were as ``well-behaved'' as the one illustrated in Figure
3.10, it would be trivial to create an algorithm to extract
connection vectors from segment header traces. This could be done by simply examining the
segments of each connection and counting the bytes sent between data
directionality changes. In practice, segment reordering, loss,
retransmission, duplication, and concurrency make the analysis much more complicated.
Figure 3.11 shows a second set of segment exchanges
that carry the same a-b-t connection vector of Figure
3.1.
The first data segment of the ADU sent from the connection
acceptor, the 6th segment, is lost somewhere in the network,
forcing this endpoint to retransmit this segment some time later as the 9th segment.
Depending on the location of the monitor (before or after the point of loss),
the collected segment header trace may or may not
include the 6th segment.
If this segment is present in the trace (like in the trace collected at
monitoring point 2), the analysis program must detect
that the 9th segment is a retransmission and ignore it. This ensures we compute
the correct size of , *i.e.*, 2,555 bytes rather than 4,015 bytes.
If the lost segment is not present in the trace (like in the trace collected at monitoring
point 1), the analysis must detect the reordering of segments using their
sequence numbers and still output a size for of 2,555 bytes.
Measuring the duration of is more difficult in this case,
since the monitor never saw the 6th segment.
The best estimation is the time between the
segment with sequence number 341 and the segment with sequence number 2555. Note that if the
6th segment is seen (as for a trace collected at monitoring point 2),
the best estimate is the time between 5th and 6th segments.
A data acquisition algorithm capable of handling these two cases cannot simply
rely on counts and data directionality changes,
but must keep track of the start
of the current ADU, the highest sequence number seen so far, and the timestamp
of the last data segment. In our analysis, rather than trying to handle every possible case of loss
and retransmission, we rely on a basic property of TCP to conveniently reorder segments and still
obtain the same ADU sizes and inter-ADU quiet time durations.
This makes our analysis simpler and more robust.

Logical Order of Data Segments

A fundamental invariant that underlies our previous ADU analyses is that
every byte of application data in a TCP connection receives a sequence number,
which is unique for its direction^{3.7}.
This property also means that data segments transmitted in the same direction
can always be *logically ordered* by sequence number, and this order
is independent of both the time at which segments are observed and any
reordering present in the trace. The logical order of data segments is a very
intuitive notion. If segments 6 and 7 in Figure 3.10 carried
an HTML page, segment 6 carried the first 1,460 characters of this page,
while segment 7 carried the remaining 1,095. Segment 6 logically preceded
segment 7. When the same page is transmitted in Figure 3.11,
the first half of the HTML is in segment 6 (which was lost) and again in segment 9.
Both segments 6 and 9 (which were identical) logically precede segment 7, which carried
the second half of the HTML page.

The notion of logical order of data segments can also be applied to
segments flowing in opposite directions of a sequential TCP connection.
Each new data segment in a sequential connection
must acknowledge the final sequence number of the last in-order ADU received
in the opposite direction. If this is not the case, then the new data is not
sent in response to the previous ADU, and the connection is not sequential
(*i.e.*, two ADUs are being sent simultaneously in opposite directions).
In the previous
examples in Figures 3.10 and 3.11,
we can see that both data segments comprising acknowledge
the final sequence number of . Intuitively, no data belonging to
can be sent by the server until is completely received
and processed. The data in logically precede the
data in , and therefore the segment carrying logically
precedes the segments carrying .
Given the sequence and acknowledgment numbers of two data segments,
flowing in the same or in opposite directions, we can always say whether
the two segments carried the same data, or one of them logically preceded
the other.

Connections that fit into the sequential a-b-t model are said
to preserve a *total order of data segments* with respect to the
logical flow of data:

For any pair of data segments and , such that is not a retransmission of orIn the example in Figure 3.11, the data in segment 9 logically precedes the data in segment 7 (vice versa, either the data in logically precedes the data in , or the data in logically precedes the data in .

Logical data order is not present in data-concurrent
connections, such as the one shown in Figure 3.8.
For example, the segment that carried the last b-type ADU
(the `438 don't send` ADU)
may have been sent roughly at the same time as another segment carrying some of the new
data of the data unit sent in the opposite direction
(such as a `CHECK` ADU).
Each segment would use new sequence numbers for its new data,
and it would acknowledge the data received so far by the endpoint.
Since the endpoints have not yet seen the segment sent from the opposite endpoint,
the two segments cannot acknowledge each other.
Therefore, there is no causality between the segments, and no segment
can be said to precede the other.
This observation provides a way of detecting data concurrency purely
from the analysis of TCP segment headers. The idea is that a TCP connection
that violates the total order of data segments described above can be said to
be concurrent with certainty.
This happens whenever a pair of data segments, sent in opposite directions,
do not acknowledge each other, and therefore cannot be ordered according
the logical data order.

Formally, a connection is considered to be concurrent when
there exists at least one pair of data segments and
that either flow in opposite directions and satisfy

or that flow in the same direction and satisfy

and

, Where and are the sequence numbers of and respectively, and and are the acknowledgment numbers of and respectively. Note that, for simplicity, our refers to the cumulative sequence number accepted by the endpoint (which is one unit below the actual acknowledgment number stored in the TCP header [Pos81]). The first pair of inequalities defines the

We consider first the case where and flow in opposite directions, assuming without loss of generality that is sent from initiator to acceptor and from acceptor to initiator. If they are part of a sequential connection, either is sent after reaches the initiator, in which case acknowledges so , or is sent after reaches the acceptor in which case . Otherwise, a pair of data segments that do not acknowledge each other exists, and the connection exhibits data concurrency.

In the case of segments and flowing in the same direction,
we assume without loss of generality that
.
The only way in which can be less than
is when is a retransmission sent after , and
at least one data segment with new data sent from the opposite
direction arrives between the sending of and the sending of .
The arrival of increases the cumulative acknowledgment number in
with respect to , which means that
.
In addition, cannot acknowledge
, or would not be retransmitted. This implies that
the connection is not sequential, since the opposite side sent new data in
*without* waiting for the new data in .

Thus, only data-concurrent connections have a pair of segments that can
simultaneously satisfy inequalities
(3.1) and (3.2) or inequalities
(3.3) and (3.4).
These inequalities provide a formal test of data concurrency, which
we will use to distinguish sequential and concurrent connections
in our data acquisition algorithm.
Data-concurrent connections exhibit a *partial order of data
segments*, since segments flowing in the same direction can always be ordered
according to sequence numbers, but not all pairs of segments flowing in
opposite directions can be ordered in this manner.

Situations in which all of the segments in a concurrent data exchange
are actually sent sequentially are not detected by the previous test.
This can happen purely by chance, when applications send very little data or
send it so slowly that concurrent data sent in the reverse
direction is always acknowledged by each new data segment.
Note also that the test detects *concurrent exchanges of data*
and not concurrent exchanges of segments in which a data segment and
an acknowledgment segment are sent concurrently.
In the latter case, the logical order of data inside the connection
is never broken because there is no data concurrency.
Similarly, the simultaneous connection termination mechanism in TCP
in which two FIN segments are sent concurrently is usually not associated with data
concurrency. In the most common case, none of the FIN segments or only
one of them carries data, so the data concurrency definition is not
applicable. It is however possible to observe a simultaneous connection
termination where both FIN segments carry data, which is considered
concurrency if these segments satisfy inequalities (3.1)
and (3.2).

Data Analysis Algorithm

We have developed an efficient data analysis algorithm that can determine whether a connection is sequential or concurrent, and can measure ADU sizes and quiet time durations in the presence of arbitrary reordering, duplication, and loss. Rather than trying to analyze every possible case of reordering, duplication/retransmission, and loss, we rely on the logical data order property, which does not depend on the original order and timestamps.

Given the set of segment headers of a TCP connection sorted by timestamp, the algorithm performs two passes:

- Insert each data segment as a node into the data structure
`ordered_segments`. This is a list of nodes that orders data segments according to the logical data order (bidirectional order for sequential connections, unidirectional order for concurrent connections). The insertion process serves also to detect data exchange concurrency. This is because connections are initially considered sequential, so their segments are ordered bidirectionally, until a segment that cannot be inserted according to this order is found. No backtracking is needed after this finding, since bidirectional order implies unidirectional order for both directions. - Traverse
`ordered_segments`and output the a-b-t connection vector (sequential or concurrent) for the connection. This is straight-forward process, since segments in the data structure are already ordered appropriately.

The first step of the algorithm creates a doubly-linked list,
`ordered_segments` in which each list node represents a data segment
using the following four fields:

- : the sequence number of the segment in the initiator to acceptor direction (that we will call the A direction). This sequence number is determined from the final sequence number of the segment (if the segment was measured in the ``A'' direction), or from the cumulative acknowledgment number (if measured in the ``B'' direction).
- : the sequence number of the segment in the acceptor to initiator direction.
- : the direction in which the segment was sent (A or B).
- : the monitoring timestamp of the segment.

or

Insertion of a node into the list starts backward from the tail of the `ordered_segments`
looking for an insertion point that would satisfy the first
invariant. If the connection is still being considered sequential,
the insertion point must also satisfy the second invariant.
This implies comparing the sequence numbers of the new segment with those of the last
segment in the `ordered_segments`. The comparison can result in the
following cases:

- The last segment of
`ordered_segments`precedes the new one according to the bidirectional order above. If so, the new segment is inserted as the new last element of`ordered_segments`. - The last segment of
`ordered_segments`and the new segment have the same sequence numbers. In this case, the new segment is a retransmission and it is discarded. - The new segment precedes the last segment of
`ordered_segments`according to the bidirectional order. This implies that network reordering of TCP segments occurred, and that the new segment should be inserted before the last segment of`ordered_segments`to preserve the bidirectional order of the data structure. The new segment is then compared with the predecessors of the last segment in`ordered_segments`until its proper location is found, or inserted as the first segment if no predecessors are found. - The last segment of
`ordered_segments`and the new segment have different sequence numbers and do not show bidirectional order. This means that the connection is concurrent. The segment is then inserted according to its unidirectional order.

The second step is to walk through the linked list and produce an a-b-t
connection vector. This can be accomplished in time using `ordered_segments`.
For concurrent connections, the analysis goes through the
list keeping separate data for each direction of the connection. When a long
enough quiet time is found (or the connection is closed),
the algorithm outputs the size of the ADU.
For sequential connections, the analysis looks
for changes in directionality and outputs the amount of data in between
the change as the size of the ADU.
Sufficiently long quiet times also mark ADU boundaries,
indicating an epoch without one of the ADUs.

Reordering makes the computation of quiet times more complex than it seems. As shown in Figure 3.11, if the monitor does not see the first instance of the retransmitted segment, the quiet times should be computed based on the segments with sequence numbers 341 and 2555. This implies adding two more fields to the list nodes:

- : the minimum timestamp of any segment whose position in the order is not lower than the one represented by this node. Due to reordering, one segment can precede another in the bidirectional order and at the same time have a greater timestamp. In this case, we can use the minimum timestamp as a better estimate of the send time of the lower segment.
- : the maximum timestamp of any segment whose place in the order is not greater than the one represented by this node. This is the opposite of the previous field, providing a better estimate of the receive time of the greater segment.

for being the last segment (in the logical data order) of the first ADU, and being the first segment (in the logical data order) of the second ADU. For the example in Figure 3.11, at monitoring point 1, the value of for the node for the 9th segment (that marks a data directionality boundary when segment nodes are sorted according to the logical data order) is the timestamp of the 7th segment. Therefore, the quiet time is estimated as . Note that the use of more than one timestamp makes it possible to handle IP fragmentation elegantly. Fragments have different timestamps, so a single timestamp would have to be arbitrarily set to the timestamp of one of the fragments. With our algorithm, the first fragment provides sequence numbers and usually , while the last fragment usually provides .

Our trace processing algorithm makes two assumptions. First, it assumes we can isolate the segments of individual connections. Second, it assumes that no wraparound of sequence numbers occurs (otherwise, logical data order would not be preserved). These two assumptions can be satisfied by preprocessing the trace of segment headers. Isolating the segments of individual TCP connections was accomplished by sorting packet header traces on five keys: source IP address, source port number, destination IP address, destination port number, and timestamp. The first four keys can separate segments from different TCP connections as long as no source port number is reused. When a client establishes more than one connection to the same server (and service), these connections share IP addresses and destination port numbers, but not source port numbers. This is true unless the client is using so many connections that it reuses a previous source port number at some point. Finding such source port number reuses is relatively common in our long traces, which are at least one hour long. Since segment traces are sorted by timestamp, it is possible to look for pure SYN segments and use them to separate TCP connections that reuse source port numbers. However, SYN segments can suffer from retransmissions, just like any other segment, so the processing must keep track of the sequence number of the last SYN segment observed. Depending on the operating system of the connection initiator, this sequence number is either incremented or randomly set for each new connection. In either case, the probability of two connections sharing SYN sequence numbers is practically zero.

Segment sorting according to the previous 5 keys requires
time (we use the Unix `sort` utility for our work). It is also possible
to process the data without an initial sorting step by keeping state in memory
for each active connection. On the one hand, this can potentially eliminate
the costly step, making the entire processing run in linear time. On the
other hand, it complicates the implementation, and increases the memory requirements
substantially^{3.8}.
Detecting the existence of distinct connections with identical source and destination IP addresses
and port numbers requires time, simply by keeping track of SYN sequence numbers
as discussed above. In our implementation,
this detection is done at the same time as segments are inserted
into `ordered_segments` data structure, saving one pass.

TCP sequence numbers are 32-bit integers, and the initial sequence
number of a TCP connection can take any value between 0 and .
This means that wraparounds are possible, and relatively frequent.
One way to handle sequence number wraparound is by keeping track
of the initial sequence number and performing a modular subtraction. However,
if the SYN segment of a connection is not observed (and therefore the
initial sequence number is unknown), using modular arithmetic will fail whenever the
connection suffers from reordering of the first observed segments. In this case
the subtraction would start in the wrong place, *i.e.*, from the sequence number of the
first segment seen, which is not the lowest sequence number due to the reordering.
One solution is to use backtracking, which complicates the processing of traces.

A related problem is that representing sequence numbers as 32-bit integers
is not sufficient for connections that carry more than bytes of data (4 GB).
The simplest way to address this measurement problem is to encode sequence
numbers using more than 32 bits in the `ordered_segments` data
structure. In our implementation we use 64 bits to represent sequence numbers,
and rely on the following algorithm^{3.9}to accurately convert 32 bit sequence numbers
to 64-bit integers even in the presence of wraparounds.
The algorithm makes use of a wraparound counter and a pair of flags
for each direction of the connection.
The obvious idea is to increment the counter each time a transition from a high sequence
number to a low sequence number is seen.
However, due to reordering, the counter could be incorrectly incremented more than once.
For example, we could observe four segments with sequence numbers
, and . Wraparound processing should convert them into
, and . However, if the wraparound
counter is incremented every time a transition from a high sequence number
to a low sequence number is seen,
the counter would be incremented once for the segment with the sequence number
and again for the segment with sequence number .
In this case, the wraparound processing would result in four segments
with sequence numbers
, and
.
The second increment of the counter would be incorrect.

The solution is to use a flag that is set
after a ``low'' sequence number is seen, so the counter is incremented only
once after each ``crossing'' of .
This opens up the question of when to unset this flag so that the next true
crossing increments the counter.
This can be solved by keeping track of the crossing of the middle sequence
number. In our implementation, we use two flags, `low_seqno`
and `high_seqno`, which are set independently. If the next segment
has a sequence number in the first quarter of (*i.e.*, in the range
between 0 and ), the flag `low_seqno` is set to true.
If the next segment has a sequence number in the fourth quarter
of (*i.e.*, in the range between and ),
the other flag`high_seqno` is set to true. These flags are unset, and the
counter incremented, when both flags are true and the next segment is not in
the first or the fourth quarter of .
Sequence numbers in the first quarter are incremented
to times the counter plus 1.
The rest are incremented by plus the counter.
This handles the pathological reordering case in which the sequence
number of the first segment in a connection is very close to zero, and the next segment
is very close to . In this case the low sequence number would be incremented
by .
This algorithm requires no backtracking, and runs in time.
In our implementation, the sequence number conversion algorithm
has been integrated into the same pass as the insertion step of the ADU
analysis.

Our data acquisition techniques have been implemented in the analysis program `tcp2cvec`.
The program also handles a number of other difficulties that arise when
processing real traces, such as TCP implementations that behave in non-standard
ways. In addition,
it also implements the analysis of network-level parameters described in the next chapter.

Doctoral Dissertation: *Generation and Validation of Empirically-Derived TCP Application Workloads*

© 2006 Félix Hernández-Campos