COMP 530: Lab 3: Synchronization

In this lab you will develop a multi-threaded simulation of a Least Recently Used (LRU) cache of keys.

The course staff have provided you with a simple, sequential implementation of the code. The sequential implementation will need synchronization to work properly with multiple threads. Your job will be to create several parallel verions of the code, with increasing sophistication.

As before, this assignment does not involve writing that many lines of code (you will probably copy/paste the base code several times). The changes you make will probably be in the tens of lines. The hard part is figuring out the few lines of delicate code to write, and being sure your synchronization code is correct. We strongly recommend starting early and writing many test cases.

LRU Model

In this assignment, we will assume that our cache needs to track the reference count of a set of keys (here, integers from 0..MAX_KEY). When a key is referenced, by calling reference(key), the key should be found in the cache, and either added and the reference count set to 1, or the reference count incremented by one (if already present). This cache is organized as a simple, singly-linked list.

Periodically, a thread will need to clean the cache (similar to the clock algorithm we saw in class). The cleaner thread iterates through the list and decrements the reference count by 1. For any node that reaches a reference count of zero, the node should be removed from the list and freed.

Getting the starter code

You will need to click on this link to create a private repository for your code.

Once you have a repository, you will need to clone your private repository (see the URL under the green "Clone or Download" button, after selecting "Use SSH". For instance, if your private repo is called lab3-team-don:

We provided you with a single-threaded, sequential version of the code in sequential-lru.c. This C file implements the functions defined in lru.h:

We provide you with a test harness in main.c. You will complete LRU implementations in mutex-lru.c and fine-lru.c. You may copy any code from sequential-lru.c into these files that you find useful, but it will require modification.

Note that typing make generates three different executables. Currently, only sequential-lru will work, and only with one thread. There will probably be compiler warnings for the other variants.

The code comes with the ability to turn debugging prints on and off at compile time. You can edit lru.h to uncomment the line below, and you will get debug output printing after the initial selftests, and at the end of a period of time (specified on the command line):

Note that heavy use of printf is not a good idea beyond initial testing because printf includes synchronization, which can inadvertently hide bugs. That said, this style of macro-enabled debug messages can help you understand what is happening, and easily compile out the messages once you believe the code is working.

To build the code, and a very, very simple test case, type make. You can test the starter code as follows:

All but the last list print should be deterministic, as this is controlled by the code in self_tests() (you may add additional tests if you like). The last print will vary from run to run, and is the output of running the simulation for 30 seconds.

This test harness takes several options to control the length of time it runs, and the number of threads. To get this information, try:

Do not use this implementation with more than one thread---it will break because it does not use any synchronization!

Thread Programming Guidelines

Before you begin the assignment, (re-)read Coding Standards for Programming with Threads. You are required to follow these standards for this project. Because it is impossible to determine the correctness of a multithreaded programming via testing, grading on this project will primarily be based on reading your code not by running tests. Your code must be clear and concise. If your code is not easy to understand, then your grade will be poor, even if the program seems to work. In the real world, unclear multi-threaded code is extremely dangerous -- even if it "works" when you write it, how will the programmer who comes after you debug it, maintain it, or add new features? Feel free to sit down with the TA or instructor during office hours for code inspections before you turn in your project.

Programming multithreaded programs requires extra care and more discipline than programming conventional programs. The reason is that debugging multithreaded programs remains an art rather than a science, despite more than 30 years of research. Generally, avoiding errors is likely to be more effective than debugging them. Over the years a culture has developed with the following guidelines in programming with threads. Adhering to these guidelines will ease the process of producing correct programs:

Helpful Resources on Concurrent Lists

One very helpful reference for implementing concurrency on a linked list is the book The Art of Multiprocessor Programming by Herlihy and Shavit. In particular, look at Chapter 9.4 and 9.5 for examples of how to do this sort of synchronization.

Using a Mutex

Exercise 1. (7 points) Implement a thread-safe list in mutex-lru.c, using coarse-grained locking. In other words, for this exercise, it is sufficient to have one lock for the entire list. To complete this exercise, we recommend using pthread mutex functions, including pthread_mutex_init, pthread_mutex_lock, and pthread_mutex_unlock.
Be sure to test your code by running with lru-mutex -c XXX (where XXX is an integer greater than 1), in order to check that the code works properly. Because multi-threaded code interleaves memory operations non-deterministically, it may work once and fail a second time---so test very thoroughly and add a lot of assertions to your code.

Don't be alarmed if you end up with a relatively big or relatively empty list at this point, as long as the list is correct otherwise.

Testing on a multi-core machine

Classroom is a 16 core machine. Be sure to test your code regularly on classroom, as some bugs may only manifest with multiple CPUs.

Keeping a Target Cache Size

Ideally, we want to maintain that the cache has a certain number of elements --- i.e., that the count variable stays between LOW_WATER_MARK and HIGH_WATER_MARK. In the second exercise, you will need to use condition variables to ensure two things:

Although you could simply have a thread run in an infinite while loop, we can do better. In particular, we would like you to use condition variables (e.g., pthread_cond_wait() and pthread_cond_signal), as well as the existing mutex synchronization, to allow the delete thread to sleep until there is work to do, and then to wake up once there is.

You will need to ensure that the resulting mutex LRU list implementations supports this behavior correctly. It is not necessary for this to work with the sequential version.

Exercise 2. (8 points) Add condition variable support to the mutex lru list, using a condition variable to block reference and clean, as described above.
You need to also write at least two additional unit tests (e.g., under self_tests() or another test function), that ensure that the node count is being properly maintained. Be sure not to break sequential trie, and be sure this continues to work as you complete the next exercises.

Finer-Grained Locking

You can improve concurrency by supporting a single lock per node, allowing threads to execute concurrently in different parts of the trie.

Exercise 3. (10 points) Implement a list that uses fine-grained locking in fine-lru.c. In other words, every node in the list should have its own lock (note the change in the node definition in this file).
What makes fine-grained locking tricky is ensuring that you cannot deadlock while acquiring locks, and that a thread doesn't hold unnecessary locks. Be sure to document your locking protocol in your README.
Be sure to include support for condition variables checking the high and low water mark. Here, you can use one global lock to protect the count and to manage the condition variables, but do NOT hold this lock for the entire list traversal, or you will not get credit for this exercise. Further note that it is ok for count to be a bit stale --- e.g., you can drop the global lock, do an insert, and then reacquire the global lock to increment count and possibly signal other threads; this window where count is out of sync with the list is ok, as long as the end result works out.

Challenge! (up to 5 points, depending on how elegant and correct your solution is) Replace the list with a concurrent skiplist. Read this article (or search the web) to learn more about skip lists.
To complete this challenge, implement a skiplist to replace the current next pointer in each list node.
Be careful to use the thread-safe pseudo-random number generator.

Challenge! (up to 40 points, depending on how elegant and correct your solution is) Important. This challenge is extremely hard, so do not start this before your lab is otherwise complete. Getting all 40 points will require substantial documentation demonstrating the correctness of your implementation.
Read-copy update is an alternative to reader/writer locking that can be more efficient for read-mostly data structures. The key idea is that readers follow a carefully designed path through the data structure without holding any locks, and writers are careful about how they update the data structure. Writers still use a lock to mutually exclude each other. Read enough from the link above, or other sources, to learn how RCU works.
Task: Create a list variant (rcu-lru.c) which uses RCU instead of a reader-writer lock. You may use a 3rd party RCU implementation for the helper functions (e.g., rcu_read_lock, or write your own. You should write the rcu-lru yourself. We highly recommend keeping this in a separate source file from your main assignment.
Note: RCU requires memory barriers in some cases, even for readers, so be sure you understand where these need to be placed.

Hand-In Procedure

Type make handin in the lab directory. You may submit more than once; if you do this, the most recent submission will be graded. The way submission works is basically that you create a tag in git, which is then pushed to github. If you look at your lab page on github, you will see a pulldown list with "Branch: master". If you drop this list down, you will see an option to view tags. If you choose the tag handin, you will be able to view your submitted code and confirm that it is correct.

Important: We will grade for correctness primarily by reading your code. Although we will also test it, your grade will be based on whether we are convinced that the code is correct under all possible thread interleavings. Thus, good comments and clear code that lay out the safety and liveness arguments are very much in your interest.

All programs will be tested on classroom.cs.unc.edu. All programs, unless otherwise specified, should be written to execute in the current working directory. Your correctness grade will be based solely on your program's performance on classroom.cs.unc.edu. Make sure your programs work on classroom!

Generally, unless the homework assignment specifies otherwise, you should compile your program using the provided Makefile (e.g., by just typing make on the console). Do not add any special command line arguments ("flags") or compiler options to the Makefile.

The program should be neatly formatted (i.e., easy to read) and well-documented. In general, 75% of your grade for a program will be for correctness, 25% for "programming style" (appropriate use of language features [constants, loops, conditionals, etc.], including variable/procedure/class names), and documentation (descriptions of functions, general comments [problem description, solution approach], use of invariants, pre-and post conditions where appropriate).

Make sure you put your name(s) in a header comment in every file you submit. Any de-anonymizing information, such as your name, should be enclosed in redacted comments (wrapped in @* *@).

Acknowledgments

Portions of the thread programming guidelines are adopted undergraduate OS course guidelines created by Mike Dahlin and Lorenzo Alvisi.

`int init(int numthreads)`	Initialize any global variables or synchronization primitives (Hint: See `pthread_mutex_init()` and `pthread_cond_init()`. May not need to do anything. Returns 0 on success, -errno on failure.
`int reference(int key)`	Search the list, and, if found, increment the key's reference count by one. If not found, add to the list. Returns 1 on success, 0 on failure.
`void clean (int check_water_mark)`	Iterate through the list, and decrement the reference count. If the reference count drops to zero, delete the element. If `check_water_mark` is set, block until there are more than `LOW_WATER_MARK` elements in the list.
`void shutdown_threads()`	Wake up any blocked threads. May not need to do anything in each variant.
`print()`	Print the contents of the list, primarily for debugging and testing.