COMP 530: Lab 3: Synchronization

In this lab you will develop a multi-threaded simulation of a Least Recently Used (LRU) cache of keys.

The course staff have provided you with a simple, sequential implementation of the code. The sequential implementation will need synchronization to work properly with multiple threads. Your job will be to create several parallel verions of the code, with increasing sophistication.

As before, this assignment does not involve writing that many lines of code (you will probably copy/paste the base code several times). The changes you make will probably be in the tens of lines. The hard part is figuring out the few lines of delicate code to write, and being sure your synchronization code is correct. We strongly recommend starting early and writing many test cases.

LRU Model

In this assignment, we will assume that our cache needs to track the reference count of a set of keys (here, integers from 0..MAX_KEY). When a key is referenced, by calling reference(key), the key should be found in the cache, and either added and the reference count set to 1, or the reference count incremented by one (if already present). This cache is organized as a simple, singly-linked list.

Periodically, a thread will need to clean the cache (similar to the clock algorithm we saw in class). The cleaner thread iterates through the list and decrements the reference count by 1. For any node that reaches a reference count of zero, the node should be removed from the list and freed.

Picking your group

You may do the lab alone, or in a group. You can complete this lab with a different team than the previous lab, if you prefer. If you work in a group, please submit one assignment to Gradescope, and list all group members in the code comments.

Getting the starter code

You will need to click on this link to create a private repository for your code.

Once you have a repository, you will need to clone your private repository (see the URL under the green "Clone or Download" button, after selecting "Use SSH". For instance, if your private repo is called lab3-team-don:

We provided you with a single-threaded, sequential version of the code in sequential-lru.c. This C file implements the functions defined in lru.h:

We provide you with a test harness in main.c. You will complete LRU implementations in mutex-lru.c and fine-lru.c. You may copy any code from sequential-lru.c into these files that you find useful, but it will require modification.

Note that typing make generates three different executables. Currently, only sequential-lru will work, and only with one thread. There will probably be compiler warnings for the other variants.

The code comes with the ability to turn debugging prints on and off at compile time. You can edit lru.h to uncomment the line below, and you will get debug output printing after the initial selftests, and at the end of a period of time (specified on the command line):

Note that heavy use of printf is not a good idea beyond initial testing because printf includes synchronization, which can inadvertently hide bugs. That said, this style of macro-enabled debug messages can help you understand what is happening, and easily compile out the messages once you believe the code is working.

To build the code, and a very, very simple test case, type make. You can test the starter code as follows:

All but the last list print should be deterministic, as this is controlled by the code in self_tests() (you may add additional tests if you like). The last print will vary from run to run, and is the output of running the simulation for 30 seconds.

This test harness takes several options to control the length of time it runs, and the number of threads. To get this information, try:

Do not use this implementation with more than one thread---it will break because it does not use any synchronization!

Thread Programming Guidelines

Before you begin the assignment, (re-)read Coding Standards for Programming with Threads. You are required to follow these standards for this project. Because it is impossible to determine the correctness of a multithreaded programming via testing, grading on this project will primarily be based on reading your code not by running tests. Your code must be clear and concise. If your code is not easy to understand, then your grade will be poor, even if the program seems to work. In the real world, unclear multi-threaded code is extremely dangerous -- even if it "works" when you write it, how will the programmer who comes after you debug it, maintain it, or add new features? Feel free to sit down with the TA or instructor during office hours for code inspections before you turn in your project.

Programming multithreaded programs requires extra care and more discipline than programming conventional programs. The reason is that debugging multithreaded programs remains an art rather than a science, despite more than 30 years of research. Generally, avoiding errors is likely to be more effective than debugging them. Over the years a culture has developed with the following guidelines in programming with threads. Adhering to these guidelines will ease the process of producing correct programs:

Helpful Resources on Concurrent Lists

One very helpful reference for implementing concurrency on a linked list is the book The Art of Multiprocessor Programming by Herlihy and Shavit. In particular, look at Chapter 9.4 and 9.5 for examples of how to do this sort of synchronization.

Using a Mutex

Exercise 1. (7 points) Implement a thread-safe list in mutex-lru.c, using coarse-grained locking. In other words, for this exercise, it is sufficient to have one lock for the entire list. To complete this exercise, we recommend using pthread mutex functions, including pthread_mutex_init, pthread_mutex_lock, and pthread_mutex_unlock.
Be sure to test your code by running with lru-mutex -c XXX (where XXX is an integer greater than 1), in order to check that the code works properly. Because multi-threaded code interleaves memory operations non-deterministically, it may work once and fail a second time---so test very thoroughly and add a lot of assertions to your code.

Don't be alarmed if you end up with a relatively big or relatively empty list at this point, as long as the list is correct otherwise.

Testing on a multi-core machine

Comp530fa20 is a 16 core machine. Be sure to test your code regularly on the class system, as some bugs may only manifest with multiple CPUs.

Keeping a Target Cache Size

Ideally, we want to maintain that the cache has a certain number of elements --- i.e., that the count variable stays between LOW_WATER_MARK and HIGH_WATER_MARK. In the second exercise, you will need to use condition variables to ensure two things:

Although you could simply have a thread run in an infinite while loop, we can do better. In particular, we would like you to use condition variables (e.g., pthread_cond_wait() and pthread_cond_signal), as well as the existing mutex synchronization, to allow the delete thread to sleep until there is work to do, and then to wake up once there is.

You will need to ensure that the resulting mutex LRU list implementations supports this behavior correctly. It is not necessary for this to work with the sequential version.

Exercise 2. (8 points) Add condition variable support to the mutex lru list, using a condition variable to block reference and clean, as described above.
You need to also write at least two additional unit tests (e.g., under self_tests() or another test function), that ensure that the node count is being properly maintained. Be sure not to break sequential trie, and be sure this continues to work as you complete the next exercises.

Finer-Grained Locking

You can improve concurrency by supporting a single lock per node, allowing threads to execute concurrently in different parts of the trie.

Exercise 3. (10 points) Implement a list that uses fine-grained locking in fine-lru.c. In other words, every node in the list should have its own lock (note the change in the node definition in this file).
What makes fine-grained locking tricky is ensuring that you cannot deadlock while acquiring locks, and that a thread doesn't hold unnecessary locks. Be sure to document your locking protocol in your README.
Be sure to include support for condition variables checking the high and low water mark. Here, you can use one global lock to protect the count and to manage the condition variables, but do NOT hold this lock for the entire list traversal, or you will not get credit for this exercise. Further note that it is ok for count to be a bit stale --- e.g., you can drop the global lock, do an insert, and then reacquire the global lock to increment count and possibly signal other threads; this window where count is out of sync with the list is ok, as long as the end result works out.

Challenge! (up to 5 points, depending on how elegant and correct your solution is) Replace the list with a concurrent skiplist. Read this article (or search the web) to learn more about skip lists.
To complete this challenge, implement a skiplist to replace the current next pointer in each list node.
Be careful to use the thread-safe pseudo-random number generator.

Challenge! (up to 40 points, depending on how elegant and correct your solution is) Important. This challenge is extremely hard, so do not start this before your lab is otherwise complete. Getting all 40 points will require substantial documentation demonstrating the correctness of your implementation.
Read-copy update is an alternative to reader/writer locking that can be more efficient for read-mostly data structures. The key idea is that readers follow a carefully designed path through the data structure without holding any locks, and writers are careful about how they update the data structure. Writers still use a lock to mutually exclude each other. Read enough from the link above, or other sources, to learn how RCU works.
Task: Create a list variant (rcu-lru.c) which uses RCU instead of a reader-writer lock. You may use a 3rd party RCU implementation for the helper functions (e.g., rcu_read_lock, or write your own. You should write the rcu-lru yourself. We highly recommend keeping this in a separate source file from your main assignment.
Note: RCU requires memory barriers in some cases, even for readers, so be sure you understand where these need to be placed.

Style and More

To be sure your code is very clean, it must compile with make without any errors or warnings!

If the various sources you use require common definitions, then do not duplicate the definitions. Make use of C's code-sharing facilities.

You must include a README file with this and any assignment. The README file should describe what you did, what approach you took, results of any measurements you made, which files are included in your submission and what they are for, etc. Feel free to include any other information you think is helpful to us in this README; it can only help your grade.

Hand-In Procedure

You will be handing in the code via Gradescope. You should have been added to the class; if not, please contact the instructors as soon as possible. If you work in a team, you should only submit one copy of your code in Gradescope; you can add your teammates to the handin. We recommend handing in directly from your github repository to the assignment.

Note: No/limited autograding. Grading for this assignment will be primarily manual. We may add an autograder to the assignment for part of the points, as time allows. In general, because all interleavings must be correct for a multi-threaded program to be correct, this requires manual grading by more seasoned programmers (i.e., course staff).

You may hand in more than once and we will take either the most recent or the one you designate as your submission, applying lateness penalties as appropriate (out-of-band).

In the event of any discrepancies with the autograder environment, programs will be tested on comp530fa20.cs.unc.edu. All programs, unless otherwise specified, should be written to execute in the current working directory. Your correctness grade will be based solely on your program's performance on comp530fa20.cs.unc.edu. Make sure your programs work on comp530fa20!

Generally, unless the homework assignment specifies otherwise, you should compile your program using the provided Makefile (e.g., by just typing make on the console). Do not add any special command line arguments ("flags") or compiler options to the Makefile.

Note: We do not have an automated way to calculate late penalties. These will be applied manually at the end of the semester.

The program should be neatly formatted (i.e., easy to read) and well-documented. The Style Guide gives additional guidance on lab code style.

If you complete any challenge problems, please describe the solution and how to demonstrate it in challenge.txt. Note that we have a separate submission option in Gradescope for submitting challenge problems, in order to accommodate later submissions without charging late hours. Even if you are submitting on time, please submit your challenge problems a second time through the appropriate challenge assignment --- we will only manually grade assignments handed in through the challenge option.

Acknowledgments

Portions of the thread programming guidelines are adopted undergraduate OS course guidelines created by Mike Dahlin and Lorenzo Alvisi.

`int init(int numthreads)`	Initialize any global variables or synchronization primitives (Hint: See `pthread_mutex_init()` and `pthread_cond_init()`. May not need to do anything. Returns 0 on success, -errno on failure.
`int reference(int key)`	Search the list, and, if found, increment the key's reference count by one. If not found, add to the list. Returns 1 on success, 0 on failure.
`void clean (int check_water_mark)`	Iterate through the list, and decrement the reference count. If the reference count drops to zero, delete the element. If `check_water_mark` is set, block until there are more than `LOW_WATER_MARK` elements in the list.
`void shutdown_threads(void)`	Wake up any blocked threads. May not need to do anything in each variant.
`void print(void)`	Print the contents of the list, primarily for debugging and testing.