COMP 530: Lab 3: Synchronization

Due 11:59 PM, Wednesday, Dec 5, 2018

Introduction

In this lab you will develop a multi-threaded simulation of a Least Recently Used (LRU) cache of keys.

The course staff have provided you with a simple, sequential implementation of the code. The sequential implementation will need synchronization to work properly with multiple threads. Your job will be to create several parallel verions of the code, with increasing sophistication.

As before, this assignment does not involve writing that many lines of code (you will probably copy/paste the base code several times). The changes you make will probably be in the tens of lines. The hard part is figuring out the few lines of delicate code to write, and being sure your synchronization code is correct. We strongly recommend starting early and writing many test cases.

LRU Model

In this assignment, we will assume that our cache needs to track the reference count of a set of keys (here, integers from 0..MAX_KEY). When a key is referenced, by calling reference(key), the key should be found in the cache, and either added and the reference count set to 1, or the reference count incremented by one (if already present). This cache is organized as a simple, singly-linked list.

Periodically, a thread will need to clean the cache (similar to the clock algorithm we saw in class). The cleaner thread iterates through the list and decrements the reference count by 1. For any node that reaches a reference count of zero, the node should be removed from the list and freed.

Getting the starter code

You will need to click on this link to create a private repository for your code.

Once you have a repository, you will need to clone your private repository (see the URL under the green "Clone or Download" button, after selecting "Use SSH". For instance, if your private repo is called lab3-team-don:

git clone git@github.com:comp530-f18/lab3-team-don.git

We provided you with a single-threaded, sequential version of the code in sequential-lru.c. This C file implements the functions defined in lru.h:

int init(int numthreads) Initialize any global variables or synchronization primitives (Hint: See pthread_mutex_init() and pthread_cond_init(). May not need to do anything. Returns 0 on success, -errno on failure.
int reference(int key) Search the list, and, if found, increment the key's reference count by one. If not found, add to the list. Returns 1 on success, 0 on failure.
void clean (int check_water_mark) Iterate through the list, and decrement the reference count. If the reference count drops to zero, delete the element. If check_water_mark is set, block until there are more than LOW_WATER_MARK elements in the list.
void shutdown_threads() Wake up any blocked threads. May not need to do anything in each variant.
print() Print the contents of the list, primarily for debugging and testing.

We provide you with a test harness in main.c. You will complete LRU implementations in mutex-lru.c and fine-lru.c. You may copy any code from sequential-lru.c into these files that you find useful, but it will require modification.

Note that typing make generates three different executables. Currently, only sequential-lru will work, and only with one thread. There will probably be compiler warnings for the other variants.

The code comes with the ability to turn debugging prints on and off at compile time. You can edit lru.h to uncomment the line below, and you will get debug output printing after the initial selftests, and at the end of a period of time (specified on the command line):

//#define DEBUG 1

Note that heavy use of printf is not a good idea beyond initial testing because printf includes synchronization, which can inadvertently hide bugs. That said, this style of macro-enabled debug messages can help you understand what is happening, and easily compile out the messages once you believe the code is working.

To build the code, and a very, very simple test case, type make. You can test the starter code as follows:

porter@classroom:~/lab3$ ./lru-sequential
=== Starting list print ===
=== Total count is 64 ===
Key 0, Ref Count 1
Key 1, Ref Count 1
Key 2, Ref Count 1
Key 3, Ref Count 1
Key 4, Ref Count 1
Key 5, Ref Count 1
Key 6, Ref Count 1
Key 7, Ref Count 1
Key 8, Ref Count 1
Key 9, Ref Count 1
Key 10, Ref Count 1
Key 11, Ref Count 1
Key 12, Ref Count 1
Key 13, Ref Count 1
Key 14, Ref Count 1
Key 15, Ref Count 1
Key 16, Ref Count 1
Key 17, Ref Count 1
Key 18, Ref Count 1
Key 19, Ref Count 1
Key 20, Ref Count 1
Key 21, Ref Count 1
Key 22, Ref Count 1
Key 23, Ref Count 1
Key 24, Ref Count 1
Key 25, Ref Count 1
Key 26, Ref Count 1
Key 27, Ref Count 1
Key 28, Ref Count 1
Key 29, Ref Count 1
Key 30, Ref Count 1
Key 31, Ref Count 1
Key 32, Ref Count 1
Key 33, Ref Count 1
Key 34, Ref Count 1
Key 35, Ref Count 1
Key 36, Ref Count 1
Key 37, Ref Count 1
Key 38, Ref Count 1
Key 39, Ref Count 1
Key 40, Ref Count 1
Key 41, Ref Count 1
Key 42, Ref Count 1
Key 43, Ref Count 1
Key 44, Ref Count 1
Key 45, Ref Count 1
Key 46, Ref Count 1
Key 47, Ref Count 1
Key 48, Ref Count 1
Key 49, Ref Count 1
Key 50, Ref Count 1
Key 51, Ref Count 1
Key 52, Ref Count 1
Key 53, Ref Count 1
Key 54, Ref Count 1
Key 55, Ref Count 1
Key 56, Ref Count 1
Key 57, Ref Count 1
Key 58, Ref Count 1
Key 59, Ref Count 1
Key 60, Ref Count 1
Key 61, Ref Count 1
Key 62, Ref Count 1
Key 63, Ref Count 1
=== Ending list print ===
=== Starting list print ===
=== Total count is 64 ===
Key 0, Ref Count 2
Key 1, Ref Count 1
Key 2, Ref Count 2
Key 3, Ref Count 1
Key 4, Ref Count 2
Key 5, Ref Count 1
Key 6, Ref Count 2
Key 7, Ref Count 1
Key 8, Ref Count 2
Key 9, Ref Count 1
Key 10, Ref Count 2
Key 11, Ref Count 1
Key 12, Ref Count 2
Key 13, Ref Count 1
Key 14, Ref Count 2
Key 15, Ref Count 1
Key 16, Ref Count 2
Key 17, Ref Count 1
Key 18, Ref Count 2
Key 19, Ref Count 1
Key 20, Ref Count 2
Key 21, Ref Count 1
Key 22, Ref Count 2
Key 23, Ref Count 1
Key 24, Ref Count 2
Key 25, Ref Count 1
Key 26, Ref Count 2
Key 27, Ref Count 1
Key 28, Ref Count 2
Key 29, Ref Count 1
Key 30, Ref Count 2
Key 31, Ref Count 1
Key 32, Ref Count 2
Key 33, Ref Count 1
Key 34, Ref Count 2
Key 35, Ref Count 1
Key 36, Ref Count 2
Key 37, Ref Count 1
Key 38, Ref Count 2
Key 39, Ref Count 1
Key 40, Ref Count 2
Key 41, Ref Count 1
Key 42, Ref Count 2
Key 43, Ref Count 1
Key 44, Ref Count 2
Key 45, Ref Count 1
Key 46, Ref Count 2
Key 47, Ref Count 1
Key 48, Ref Count 2
Key 49, Ref Count 1
Key 50, Ref Count 2
Key 51, Ref Count 1
Key 52, Ref Count 2
Key 53, Ref Count 1
Key 54, Ref Count 2
Key 55, Ref Count 1
Key 56, Ref Count 2
Key 57, Ref Count 1
Key 58, Ref Count 2
Key 59, Ref Count 1
Key 60, Ref Count 2
Key 61, Ref Count 1
Key 62, Ref Count 2
Key 63, Ref Count 1
=== Ending list print ===
=== Starting list print ===
=== Total count is 32 ===
Key 0, Ref Count 1
Key 2, Ref Count 1
Key 4, Ref Count 1
Key 6, Ref Count 1
Key 8, Ref Count 1
Key 10, Ref Count 1
Key 12, Ref Count 1
Key 14, Ref Count 1
Key 16, Ref Count 1
Key 18, Ref Count 1
Key 20, Ref Count 1
Key 22, Ref Count 1
Key 24, Ref Count 1
Key 26, Ref Count 1
Key 28, Ref Count 1
Key 30, Ref Count 1
Key 32, Ref Count 1
Key 34, Ref Count 1
Key 36, Ref Count 1
Key 38, Ref Count 1
Key 40, Ref Count 1
Key 42, Ref Count 1
Key 44, Ref Count 1
Key 46, Ref Count 1
Key 48, Ref Count 1
Key 50, Ref Count 1
Key 52, Ref Count 1
Key 54, Ref Count 1
Key 56, Ref Count 1
Key 58, Ref Count 1
Key 60, Ref Count 1
Key 62, Ref Count 1
=== Ending list print ===
=== Starting list print ===
=== Total count is 0 ===
=== Ending list print ===
Salt is 1542750928
=== Starting list print ===
=== Total count is 55 ===
Key 1, Ref Count 2
Key 3, Ref Count 1
Key 6, Ref Count 3
Key 9, Ref Count 1
Key 10, Ref Count 1
Key 11, Ref Count 1
Key 13, Ref Count 1
Key 14, Ref Count 1
Key 15, Ref Count 1
Key 16, Ref Count 1
Key 18, Ref Count 1
Key 22, Ref Count 1
Key 25, Ref Count 1
Key 26, Ref Count 1
Key 28, Ref Count 2
Key 31, Ref Count 1
Key 34, Ref Count 1
Key 35, Ref Count 1
Key 38, Ref Count 1
Key 39, Ref Count 1
Key 41, Ref Count 2
Key 44, Ref Count 1
Key 47, Ref Count 1
Key 50, Ref Count 1
Key 51, Ref Count 1
Key 52, Ref Count 1
Key 54, Ref Count 1
Key 58, Ref Count 1
Key 61, Ref Count 2
Key 65, Ref Count 2
Key 68, Ref Count 1
Key 69, Ref Count 1
Key 70, Ref Count 2
Key 71, Ref Count 2
Key 73, Ref Count 2
Key 79, Ref Count 7
Key 81, Ref Count 2
Key 82, Ref Count 2
Key 88, Ref Count 2
Key 90, Ref Count 1
Key 92, Ref Count 1
Key 93, Ref Count 2
Key 94, Ref Count 1
Key 99, Ref Count 1
Key 100, Ref Count 1
Key 101, Ref Count 3
Key 106, Ref Count 2
Key 109, Ref Count 1
Key 111, Ref Count 1
Key 118, Ref Count 1
Key 120, Ref Count 2
Key 122, Ref Count 1
Key 124, Ref Count 1
Key 125, Ref Count 1
Key 126, Ref Count 1
=== Ending list print ===

All but the last list print should be deterministic, as this is controlled by the code in self_tests() (you may add additional tests if you like). The last print will vary from run to run, and is the output of running the simulation for 30 seconds.

This test harness takes several options to control the length of time it runs, and the number of threads. To get this information, try:

porter@classroom:~/lab3$ ./sequential-lru -h
LRU Simulator.  Usage: ./lru-[variant] [options]

Options:
	-c numclients - Use numclients threads.
	-h - Print this help.
	-l length - Run clients for length seconds.


Do not use this implementation with more than one thread---it will break because it does not use any synchronization!

Thread Programming Guidelines

Before you begin the assignment, (re-)read Coding Standards for Programming with Threads. You are required to follow these standards for this project. Because it is impossible to determine the correctness of a multithreaded programming via testing, grading on this project will primarily be based on reading your code not by running tests. Your code must be clear and concise. If your code is not easy to understand, then your grade will be poor, even if the program seems to work. In the real world, unclear multi-threaded code is extremely dangerous -- even if it "works" when you write it, how will the programmer who comes after you debug it, maintain it, or add new features? Feel free to sit down with the TA or instructor during office hours for code inspections before you turn in your project.

Programming multithreaded programs requires extra care and more discipline than programming conventional programs. The reason is that debugging multithreaded programs remains an art rather than a science, despite more than 30 years of research. Generally, avoiding errors is likely to be more effective than debugging them. Over the years a culture has developed with the following guidelines in programming with threads. Adhering to these guidelines will ease the process of producing correct programs:

  1. All threads share heap data. This requires you to use proper synchronization primitives whenever two threads modify or read the shared data. Sometimes, it is obvious how to do so:
              char a[1000]
              void Modify(int m, int n)
              {
                    a[m + n] = a[m] + a[n];                        // ignore bound checks
              }
    
    If two thread will be executing this function, there is no guarantee that both will perceive consistent values of the members of the array a. Therefore, such a statment has to be protected by a mutex that ensures the proper execution as follows:
              char a[1000]
              void Modify(int m, int n)
              {
                    Lock(p);
                    a[m + n] = a[m] + a[n];                        // ignore bound checks
                    Unlock(p);
              }
    
    where p is a synchronization variable.
  2. Beware of the hidden data structures! While your own variables on the heap must be protected, it is also necessary to ensure that data structures belonging to the libraries and runtime system also be protected. These data structures are allocated on the heap, but you do not see them. Access to library functions can create situations where two threads may corrupt the data structures because proper synchronization is lacking. For example, two threads calling the memory allocator simultaneously through the malloc() library call might create problems if the data structures of malloc() are not designed for concurrent access. In that case, data structures used to track the free space might be corrupted. The solution is to use a thread-safe version of libc.
    Linking with the -pthread flag (included in your Makefile) is sufficient for libc using gcc. Other compilers may require the -D_POSIX_PTHREAD_SEMANTICS flag to select thread-safe versions.
    Finally, not all libraries support thread-safe functions. If you include extra libraries in your code, it is up to you to figure out whether they are thread safe, or if calls must be protected by a lock.
  3. Simplicity of the code is also an important factor in ensuring correct operation. Complex pointer manipulations may lead to errors and a runaway pointer may start corrupt the stacks of various threads, and therefore manifesting its presence through a set of incomprehensible bugs. Contrived logic, and multiple recursions may cause the stacks to overflow. Modern computer languages such as Java eliminate pointers altogether, and perform the memory allocation and deallocation by automatic memory management and garbage collection techniques. They simplify the process of writing programs in general, and multithreaded programs in particular. Still, without understanding of all the pitfalls that come with programming multithreaded applications, even the most sophistica ted programmer using the most sophisticated language may fall prey to some truly strange bugs.

Helpful Resources on Concurrent Lists

One very helpful reference for implementing concurrency on a linked list is the book The Art of Multiprocessor Programming by Herlihy and Shavit. In particular, look at Chapter 9.4 and 9.5 for examples of how to do this sort of synchronization.

Using a Mutex

You are now ready to start implementing multi-threaded versions of the code.

Exercise 1. (7 points) Implement a thread-safe list in mutex-lru.c, using coarse-grained locking. In other words, for this exercise, it is sufficient to have one lock for the entire list. To complete this exercise, we recommend using pthread mutex functions, including pthread_mutex_init, pthread_mutex_lock, and pthread_mutex_unlock.
Be sure to test your code by running with lru-mutex -c XXX (where XXX is an integer greater than 1), in order to check that the code works properly. Because multi-threaded code interleaves memory operations non-deterministically, it may work once and fail a second time---so test very thoroughly and add a lot of assertions to your code.

Don't be alarmed if you end up with a relatively big or relatively empty list at this point, as long as the list is correct otherwise.

Testing on a multi-core machine

Classroom is a 16 core machine. Be sure to test your code regularly on classroom, as some bugs may only manifest with multiple CPUs.

Keeping a Target Cache Size

Ideally, we want to maintain that the cache has a certain number of elements --- i.e., that the count variable stays between LOW_WATER_MARK and HIGH_WATER_MARK. In the second exercise, you will need to use condition variables to ensure two things:

  1. That a call to reference() blocks until count is below HIGH_WATER_MARK, as reference may add an element to the list.
  2. That a call to clean() blocks until count is above LOW_WATER_MARK, ensuring the list doesn't get too small.
  3. That any blocked threads will exit from their current routine when shutdown_threads() is called.

Although you could simply have a thread run in an infinite while loop, we can do better. In particular, we would like you to use condition variables (e.g., pthread_cond_wait() and pthread_cond_signal), as well as the existing mutex synchronization, to allow the delete thread to sleep until there is work to do, and then to wake up once there is.

You will need to ensure that the resulting mutex LRU list implementations supports this behavior correctly. It is not necessary for this to work with the sequential version.

Exercise 2. (8 points) Add condition variable support to the mutex lru list, using a condition variable to block reference and clean, as described above.
You need to also write at least two additional unit tests (e.g., under self_tests() or another test function), that ensure that the node count is being properly maintained. Be sure not to break sequential trie, and be sure this continues to work as you complete the next exercises.

Finer-Grained Locking

You can improve concurrency by supporting a single lock per node, allowing threads to execute concurrently in different parts of the trie.

Exercise 3. (10 points) Implement a list that uses fine-grained locking in fine-lru.c. In other words, every node in the list should have its own lock (note the change in the node definition in this file).
What makes fine-grained locking tricky is ensuring that you cannot deadlock while acquiring locks, and that a thread doesn't hold unnecessary locks. Be sure to document your locking protocol in your README.
Be sure to include support for condition variables checking the high and low water mark. Here, you can use one global lock to protect the count and to manage the condition variables, but do NOT hold this lock for the entire list traversal, or you will not get credit for this exercise. Further note that it is ok for count to be a bit stale --- e.g., you can drop the global lock, do an insert, and then reacquire the global lock to increment count and possibly signal other threads; this window where count is out of sync with the list is ok, as long as the end result works out.

Challenge! (up to 5 points, depending on how elegant and correct your solution is) Replace the list with a concurrent skiplist. Read this article (or search the web) to learn more about skip lists.
To complete this challenge, implement a skiplist to replace the current next pointer in each list node.
Be careful to use the thread-safe pseudo-random number generator.

Challenge! (up to 40 points, depending on how elegant and correct your solution is) Important. This challenge is extremely hard, so do not start this before your lab is otherwise complete. Getting all 40 points will require substantial documentation demonstrating the correctness of your implementation.
Read-copy update is an alternative to reader/writer locking that can be more efficient for read-mostly data structures. The key idea is that readers follow a carefully designed path through the data structure without holding any locks, and writers are careful about how they update the data structure. Writers still use a lock to mutually exclude each other. Read enough from the link above, or other sources, to learn how RCU works.
Task: Create a list variant (rcu-lru.c) which uses RCU instead of a reader-writer lock. You may use a 3rd party RCU implementation for the helper functions (e.g., rcu_read_lock, or write your own. You should write the rcu-lru yourself. We highly recommend keeping this in a separate source file from your main assignment.
Note: RCU requires memory barriers in some cases, even for readers, so be sure you understand where these need to be placed.

Hand-In Procedure

Type make handin in the lab directory. You may submit more than once; if you do this, the most recent submission will be graded. The way submission works is basically that you create a tag in git, which is then pushed to github. If you look at your lab page on github, you will see a pulldown list with "Branch: master". If you drop this list down, you will see an option to view tags. If you choose the tag handin, you will be able to view your submitted code and confirm that it is correct.

Important: We will grade for correctness primarily by reading your code. Although we will also test it, your grade will be based on whether we are convinced that the code is correct under all possible thread interleavings. Thus, good comments and clear code that lay out the safety and liveness arguments are very much in your interest.

All programs will be tested on classroom.cs.unc.edu. All programs, unless otherwise specified, should be written to execute in the current working directory. Your correctness grade will be based solely on your program's performance on classroom.cs.unc.edu. Make sure your programs work on classroom!

Generally, unless the homework assignment specifies otherwise, you should compile your program using the provided Makefile (e.g., by just typing make on the console). Do not add any special command line arguments ("flags") or compiler options to the Makefile.

The program should be neatly formatted (i.e., easy to read) and well-documented. In general, 75% of your grade for a program will be for correctness, 25% for "programming style" (appropriate use of language features [constants, loops, conditionals, etc.], including variable/procedure/class names), and documentation (descriptions of functions, general comments [problem description, solution approach], use of invariants, pre-and post conditions where appropriate).

The Style Guide gives additional guidance on lab code style.

Make sure you put your name(s) in a header comment in every file you submit. Any de-anonymizing information, such as your name, should be enclosed in redacted comments (wrapped in @* *@).

This completes the lab.

Acknowledgments

Portions of the thread programming guidelines are adopted undergraduate OS course guidelines created by Mike Dahlin and Lorenzo Alvisi.


Last updated: 2018-12-04 16:47:35 -0500 [validate xhtml]