COMP 530: Lab 4: Disk Performance Analysis

Due on Tuesday, November 17, 2020, 11:59 PM
Note: You may use all of your remaining late hours on this lab, including after the deadline. Also note that this assignment is strictly for extra credit and is not required.

Introduction

In this lab, you will measure the performance charactersitics of a hard disk drive (HDD) and a solid state drive (SSD). You will write a few simple micro-benchmarks, run them on a real test system, plot the results, and write a brief analysis of what you learned from the results. You will hand-in a write-up as a pdf---a few pages should be sufficient---as well as the code to run your tests. You are welcome to use existing unix tools, or write your own; if you write your own, they should only require a few tens or hundreds of lines of code. In either case, please hand in a script that runs all of your tests, so the TAs can understand precisely how you obtained the numbers you did.

Picking your group

You may do the lab alone, or in a group. You can complete this lab with a different team than the previous lab, if you prefer. If you work in a group, please submit one assignment to Gradescope, and list all group members in the code comments.

Writing Microbenchmarks

The first task you will need to do is write several microbenchmarks. A microbenchmark is a simple measurement utility for different types of operations. Here, you will write one or more simple utilities (we recommend in C), that issue different I/O patterns to a disk. Note that Linux exposes disks to users as if they were simply a file (e.g., /dev/sda), so you can write this utility to use familiar file system interfaces (open, read, write, and friends). We do recommend you parameterize this utility to take different parameters and arguments (see Lab 3 for an example of how one can use getopt to simplify command-line parameter processing).

The primary purpose is to understand the performance sensitivity of both types of devices to different I/O patterns. Thus, you will also need to be able to create a range of different I/O types.

Exercise 1. (10 points) Your first task is to write one or more microbenchmark utilities that can measure the time to issue different I/O patterns, as follows:

I/O Size: Sequentially write a range of logical blocks up to 1 GB, in different granularities. Measure the throughput (i.e., 1 GB divided by the execution time of the writes). The variable to change here is the size of each I/O. The smallest sensible size is 4KB (one logical block). So, you might write all 1GB in a series of 4KB writes, a series of 8KB writes, etc., up to a maximum of 100 MB.
I/O Stride: Extend the I/O Size experiment to also leave a configurable amount of space between each I/O request in the sequential write. The stride should range between 4KB and 100 MB, and be independent of the I/O size.
Random I/Os: Your microbenchmark framework should support a mode that can issue random writes within a range of LBAs. This experiment should be able to set the range of acceptable offsets on the device, as well as a granularity, again ranging from 4KB to 100 MB.
Read vs. Write: For all of the tests above, you should also be able to issue the same I/O patterns as reads or writes.

Hint: You can test your framework for functional correctness by running these tests on a file on any system. You only need to use the disk to collect performance measurements. For easier testing, I would recommend that you pass the device name as a command-line parameter. I would also write some unit tests that ensure you are really writing the patterns you think you are writing, say by re-reading the file after a test case.

Hint: To reduce the effects of OS-level caching, open the device with O_DIRECT, and be sure to issue an fsync at the end of each test that involves a write (so you are actually measuring the time to write everything to disk, not just to cache).

Hint: If you read an input, be sure it is in memory. Or just use /dev/zero, which just generates zeros. The issue is that if you read from a slower HDD, and benchmark on a faster SSD, you will be bottlenecked on reading from the HDD.

Data collection logistics

You will use gwion.cs.unc.edu to run these tests. You will use the /dev/sdb1 partition for the HDD experiments, and /dev/sda2 for the SSD experiments. Both should be world writable and filled with random data.

In order to avoid interference with one another, we request that you use google sheets to reserve a slot for exclusive use of these disks. A link will be shared in piazza. Before you run any experiments on gwion, please check the reservation spreadsheet to avoid interfering with anyone else's tests. If no one has a reservation, you may run tests in "non-exclusive" mode (i.e., with the understanding someone else may be doing things, but you can make sure everything works properly on gwion or get some preliminary data this way).

The fact that we are sharing a single machine means you will need to plan ahead a bit, and that you will need to reserve a slot enough in advance of the deadline that you can have time to analyze the results, possibly run more experiments, collect more data if needed, and have time to write up the results before the deadline.

Getting enough samples

One important piece of good scientific methodology is collecting enough samples that you have some confidence your measurements are representative of the real average. Even in computer systems, there is some degree of natural variation (often moreso in the real world). If you run only one experiment, you may have come across a rare outlier.

To keep this assignment tractible, we are going to require that each reported measurement is the mean of at least 5 runs. Be sure to keep data for each run, not just a mean, as you will also need to plot a confidence interval (more on this below).

In deciding which points to test for each of the above experiments, for each variable to test, we suggest you start with the minimum, midpoint, and maximum. From here, you can add midpoints (think binary search) until the "shape" of the graph becomes clear. You may use some judgment as to when you can skip points, but you should expect several of the lines to have a "knee" in the curve; points around the "knee" will be of particular interest. At a minimum, there should be at least 16 data points collected, evenly distributed over the variable space.

For your own edification, the correct way to determine how many samples you need is to use a statistical method like the Student's T-Test (NB: Check out the history of how this was developed), which uses an assumption about the expected distribution of samples and the measured variance to determine whether additional samples are needed.

Data collection and plots

This section will describe the specific experiments you should run, as well as how you should plot these data points. You may use any graphing software you like. Excel is probably simplest, although feel free to use other tools. Prof. Porter is a fan of Ploticus, but there is a significant learning curve that is only worthwhile if you plan to use these tools again in the future. Some of his students have found that the R language has some useful graphing tools.

Exercise 2. (10 points) Collect data and graph the following experiments. Draw line graphs, with each point indicating a mean, and including 95% confidence intervals as error bars for each point. Unless otherwise indicated, all graphs report throughput on the y-axis. You can calculate throughput as: total bytes read or written, divided by the completion time of the benchmark (excluding any setup costs, but definitely including the time to do an fsync on a write test).

For each of these tests, include a plot for reads and a plot for writes. There should be a complete set of all plots for the HDD and for the SSD.

I/O Size: The x-axis (variable under test) should be the I/O granularity, ranging from 4KB to 100MB.
I/O Stride: The x-axis should be the size of the stride (or "gap" between I/Os). You should plot lines for a few different I/O sizes (at least 5, ranging from 4KB to 10MB), to determine whether the trend is consistent or not.
Random I/Os: The x-axis should be the I/O granularity, again ranging from 4KB to 100 MB. You should plot this for a range of 1 GB worth of logical block addresses.

Please also create a script that executes your framework with all of these parameters, such that a single command (like ./myscript.sh will generate all of the expected data). This is useful for reproducing your results (in our case, for grading, but, if you continue doing any science in your future career, so that someone else can understand and replicate your experiment).

Analysis

The final step of this project is to do a short write up (1-2 pp of text, plus graphs) should be sufficient. The write-up is open-ended, but what we are looking for here is some analysis and interpretation of these graphs.

Exercise 3. (5 points) Write 1--2 pp of text analyzing each graph and the trends, as discussed above.

Some questions to consider in your write-up: What can you learn from these experiments? Is there an "optimal" I/O size or pattern? If you were designing a file system (or an application that does a lot of file I/Os), what lessons can you draw from these graphs. How do these results vary for reads vs. writes, or for HDDs vs. SSDs?

Hand-In Procedure

You will be handing in the code and report via Gradescope. If you work in a team, you should only submit one copy of your code in Gradescope; you can add your teammates to the handin. We recommend handing in directly from your github repository to the assignment.

Note: No autograding. Grading for this assignment will be entirely manual, and somewhat subjective (i.e., does the data make sense? How clear is the analysis and discussion?).

You may hand in more than once and we will take either the most recent or the one you designate as your submission, applying lateness penalties as appropriate (out-of-band).

Pprograms will be tested or measurements checked on gwion.cs.unc.edu. All programs, unless otherwise specified, should be written to execute in the current working directory. Your correctness grade will be based solely on your program's performance on gwion.cs.unc.edu. Make sure your programs work on gwion!

Note: We do not have an automated way to calculate late penalties. These will be applied manually at the end of the semester.

The program should be neatly formatted (i.e., easy to read) and well-documented. The Style Guide gives additional guidance on lab code style.

If you complete any challenge problems, please describe the solution and how to demonstrate it in challenge.txt. Note that we have a separate submission option in Gradescope for submitting challenge problems, in order to accommodate later submissions without charging late hours. Even if you are submitting on time, please submit your challenge problems a second time through the appropriate challenge assignment --- we will only manually grade assignments handed in through the challenge option.