CSE 306: Lab 4: Fast File System for xv6

Due on Friday, May 6, 2016, 11:59 PM
Note: You may use your remaining late hours on this lab, including after the deadline.

Introduction

In this lab, you will implement the core performance optimtimizations in the Unix Fast File System (FFS), including block groups and the large file exception, on the xv6 file system. The current xv6 file sytsem is implemented using a simple layout that places all metadata at the front of the disk, followed by data blocks, followed by a journal log. This lab will implement a more performant variant of this design.

Hand-In Procedure

When you are ready to hand in your lab code and write-up, create a file called slack.txt noting how many late hours you have used both for this assignment and in total. (This is to help us agree on the number that you have used.) This file should contain a single line formatted as follows (where n is the number of late hours):

Then run make handin-lab4 in the xv6 directory. If you submit multiple times, we will take the latest submission and count late hours accordingly.

In this and all other labs, you may complete challenge problems for extra credit. If you do this, please create a file called challenge.txt, which includes a short (e.g., one or two paragraph) description of what you did to solve your chosen challenge problem and how to test it. If you implement more than one challenge problem, you must describe each one.

This lab does not include any questions for you to answer, but you should document your design in the README file.

Unix FFS Background

The absolute first place to start is to understand the Fast File System (FFS) design. If you have not already, please read Chapter 41 of the course textbook, and review the lecture slides (and/or echo recordings) from the FFS lecture. You will be implementing several features of this design in xv6.

xv6 File System Background

A second essential task before starting is to read Chapter 6 of the xv6 book. This chapter explains the basics of how the xv6 file system is implemented and has a number of useful code pointers and explanation that will be invaluable in completing the assignment.

The current xv6 file system is basic and functional, but will not get good performance on a real disk. Similar to the basic strawman in the lecture slides, xv6 places all of the metadata at the front of the disk, followed by the data. Thus, there is guaranteed to be a large seek between reading an inode and reading data. A technique such as block groups can reduce the likelihood of seeks by placing inodes and data blocks relatively close to each other on disk. Similarly, the xv6 file system simply picks the first free inode and block, rather than making an attempt to place contents of a file together.

Note: Free data blocks are tracked with a bitmap. Free inodes are tracked by setting the flag field to zero (usually this would encode the type, such as regular file or directory.

Block groups

Your first coding task will be to implement support in xv6 for block groups. The initial xv6 layout is defined in mkfs.c, with a hard-coded, but configurable size (FSSIZE) and number of inodes (NINODES). You should add a macro, called BLOCKGROUPS, that defines a number of block groups.

Exercise 1. (20 points) Your first task is to modify the file system to stripe the inodes, free data block bit map, and data blocks across multiple groups.

You should make the number of block groups a compile-time macro for mkfs.c and test that multiple sizes divide the disk space correctly (and reasonbly if the disk space is not evenly divisible).

For the kernel file system, you should store the number of block groups in the super block, rather than hard-code this value, so that the kernel can correctly handle multiple file systes with different numbers of block groups.

For simplicity, you are welcome to add assertions to mkfs.c that the number of inodes and blocks must divide evenly by the number of block groups, and suggest alternative values to the user that would divide evenly, rather than deal with edge cases where the last block group is not completley full.

Hint: one sector must store 4096 bits (512 bytes * 8 bits/byte). It is also acceptable to set the smallest block group to 4096 blocks, to avoid wasting bitmap space.

Hint 2: The macros IBLOCK() and BBLOCK() in fs.h will be helpful in adjusting how inodes and data blocks are located on disk.

Spreading and Packing Heuristics

FFS included a number of heuristics for maintaining locality, including (1) placing new directories in one of the least-utilized block groups, (2) placing files in the same directory in the same block group (when possible), and (3) chunking and spreading large files across multiple block groups. Currently, xv6 just places new files and data blocks on the first available inode or block.

Exercise 2. (10 points) Implement each of these three heuristics in xv6. Note that this requires keeping some statistics about how full each block group is, so that load can be spread across block groups.

An important aspect of this is handling the cases where a heuristic doesn't work neatly, such as when a block group is full and more files would be added, if possible. For instance, if a directory is in a full block group, it would be best for all newly-created files in that directory to be in the same, second block group. Similarly, if space in the first block group becomes available later, new files in the directory should be in the first block group.

Exercise 3. (5 points) Write at least three test cases and that exercise these heuristics (and block groups in general). These tests should be enough to show that related files and blocks are being placed in the same block group. You may need to add a debugging output mode that can log or print the mapping of the directory hierarchy to block groups. Be sure to document how these tests work and the expected output in a README file.

Challenge. (15 points) Spread the transaction log across block groups for faster commit. This requires a number of subtle design issues, and should be discussed with Prof. Porter if you are planning to tackle this. In short, you will need to handle recovery correclty, as well as reason about how to figure out where the head of the log is, and when to checkpoint an "log group."

This completes the lab. Make sure you rigorously test your code, document the design well, and hand in your work with make handin-lab4.