IBM-UNC Traffic Modeling Project
**An overview about trace
collection: Trace Collection in the
UNC-CH DiRT Lab presentation (3/23/06)
Traffic collection details
- /usr/local/bin/traffic_capture.bash filename_prefix
capture_duration_in_seconds
- this is a bash script that checks the four playpen disks for
space, picking the first one that is empty
- it then uses the DAG card to capture a bidirectional trace
for the time specified above, and writes to disk
- /usr/local/bin/process_trace.bash filename_prefix
playpen_#_dag_file playpen_#_destination_files
- this is a bash script that converts the captured trace from
dag to tcpdump (pcap) format, and splits the two directions
- then anonymizes the trace and writes out the binary tcpdump
file to disk
- if the total dump time was more than a half hour, then it
creates half-hour slices (files) of the original file
- the slices are then zipped (to ~ < 50% of original slice)
and the original slices are deleted
- The above zipped files can be sftp'd to
felix.cs.unc.edu:/bigpen/greg (send email to aikat@cs.unc.edu)
To check the results of the traffic_capture.bash script:
- first check the .dag.log file to see the rate as well as total
of data captured
- less xyz.dag.log -- this will let you view the log file page
by page on screen
- to view the original dag file,
- /usr/local/bin/dagconvert -Terf:pcap -i xyz.dag -f a | less
- /usr/local/bin/dagconvert -Terf:pcap -i xyz.dag -f b | less<>>
- <>Note the '-f a' and '-f b' difference in the above two
commands -
that'll give you the two directions. This above won't write to a file,
just show on screen a page at a time. Use spacebar to scroll
ahead.
Note this shows you the original dag catured trace, but in tcpdump
format - which is good because we are all more familar with this format
and can make sense of it.>
To check the results of the process_trace.bash script:
- tcpdump -n -r xyz.a.anon.tcpdump | less
- this will read the tcpdump file, and show you page by page,
what has been captured.
- -n does not convert addresses - i.e. you'll see the IP
addresses instead of DNS converted hostnames.
- this is especially important here, since addresses at this
stage are already anonymized, so DNS conversion will be meaningless.
- -r reads the packets from the file
- you can use -tt if you want an unformatted timestamp,
- but I would recommend leaving that out here, so it's
human-readable, and hence verifiable by you
- once the slices are ready, they'll be in zipped format, but
here's how you could check them:
- xyz.a.anon.tcpdump.slice1.gz if the filename, say
- gunzip -c xyz.a.anon.tcpdump.slice1.gz | tcpdump -n -r - |
head -5
- this will unzip the sliced file, and read it using tcpdump,
but read only the first 5 lines
- using this, you can check, e.g., that slice2 starts at 20
minutes into your original capture.