CSE 306: Lab 3: Synchronization

Due 11:59 PM, Monday, April 15, 2013

Introduction

In this lab you will develop a multi-threaded simulation of the Internet's Domain Name System (DNS), which maps host names onto human readable addresses.

We are not going to build a true DNS server, but, for simplicity, are instead writing a simulator of the typical requests a DNS server would see (and one that is unique to our system).

The course staff have provided you with a simple, sequential implementation. The sequential implementation will need synchronization to work properly with multiple threads. Your job will be to create several parallel verions of increasing sophistication.

Getting started

We will provide you with some initial source code to start from. To fetch that source, use Git to commit your Lab 2 source, fetch the latest version of the course repository, and then create a local branch called lab3 based on our lab3 branch, origin/lab2:

kermit% cd ~/CSE306/lab
kermit% git commit -am 'my solution to lab2'
Created commit 254dac5: my solution to lab2
 3 files changed, 31 insertions(+), 6 deletions(-)
kermit% git pull

Already up-to-date.
kermit% git checkout -b lab3 origin/lab3
Branch lab3 set up to track remote branch refs/remotes/origin/lab3.
Switched to a new branch "lab3"
kermit% 

The git checkout -b command shown above actually does two things: it first creates a local branch lab3 that is based on the origin/lab3 branch provided by the course staff, and second, it changes the contents of your lab directory to reflect the files stored on the lab3 branch. Git allows switching between existing branches using git checkout branch-name, though you should commit any outstanding changes on one branch before switching to a different one.

You will now need to merge the changes you made in your lab2 branch into the lab3 branch, with the git merge lab2 command.

In some cases, Git may not be able to figure out how to merge your changes with the new lab assignment (e.g. if you modified some of the code that is changed in the second lab assignment). In that case, the git merge command will tell you which files are conflicted, and you should first resolve the conflict (by editing the relevant files) and then commit the resulting files with git commit -a.

Lab 3 contains the new source files in the lab3 directory.

Sharing code with a partner

Unless we hear otherwise from you, we will assume you are working with the same partner as lab 2. You are welcome to change partners if you like; if you do, please email the course staff immediately to change permissions on your repositories.

We will set up group permission to one partner's git repository on scm. Suppose Partner A is the one handing in the code. Partner A should follow the instructions above to merge the lab3 code. After Partner A has pushed this change to scm, Partner B should simply clone Partner A's repository and use it. For example:

kermit% git clone ssh://PartnerB@scm.cs.stonybrook.edu:130/scm/cse306git-s13/hw-PartnerA lab3

Note that it may take a few days about letting the course staff know your partner selection for the tech staff to apply these permission changes. Again, you are not required to use git to coordinate changes, only to hand in the assignment, but we recommend you learn to use git. You may use any means you like to share code with your partner.

Hand-In Procedure

When you are ready to hand in your lab code and write-up, create a file called slack.txt noting how many late hours you have used both for this assignment and in total. (This is to help us agree on the number that you have used.) This file should contain a single line formatted as follows (where n is the number of late hours):

late hours taken: n

Then run make handin in the labs directory. If you submit multiple times, we will take the latest submission and count late hours accordingly.

In this and all other labs, you may complete challenge problems for extra credit. If you do this, please create a file called challenge.txt, which includes a short (e.g., one or two paragraph) description of what you did to solve your chosen challenge problem and how to test it. If you implement more than one challenge problem, you must describe each one.

This lab does not include any questions for you to answer, but you should document your design in the README file.

DNS Background

DNS maps human-readable host names onto IP addresses. For instance, it maps the name www.cs.stonybrook.edu to the IP address 130.245.27.2.

Each computer on the internet is configured to use one or more DNS servers for name resolution. On a Linux system, the server used often stored in /etc/resolv.conf. Note that the servers are listed by IP address, otherwise the system would have an infinite recursion!

By convention, DNS servers typically run on port 53.

In resolving hostnames, resolution is actually done backwards. For example, in resolving www.cs.stonybrook.edu, a server will start by figuring out which server is responsible for the .edu domain, then query this server to find out the authoritative DNS server for stonybrook.edu, then query that server to find the authoritative DNS server for cs.stonybrook.edu, which then responds with the address of the server named www. In general, servers cache previously resolved addresses, so that a subsequent request for the same host name can be serviced more quickly.

If you are curious to learn more about DNS, the Wikipedia article is a good place to start.

Tries

In our DNS simulation, we will use a trie to store the mappings of host names to IP addresses. A trie is a space-optimized search tree. The key difference between a trie and a typically search tree is that part of the search key is encoded by the position in the tree.

Consider the illustration to the right: Trie Illustration The root simply contains a list of top-level domains (.com, .edu, ...). Each node has a list of children. This is a simple example, so under .com, there is only a 'k' character, which then has children 'as' and 'faceboo', which encode 'ask.com' and 'facebook.com'. Similarly, the '.edu' sub-tree encodes 'saybrook.edu' and 'stonybrook.edu'.

The key space-saving property of a trie is that common substrings can be coalesced into a single interior node. Technically, we are building a "reverse" trie, since most tries that store a string would compare from left-to-right, not right-to-left, as we are doing. Finally, note that a DNS server doesn't necessarily have to use a trie, many DNS servers use Red-black trees or other trees for various reasons.

We have provided you with a simple interface in trie.h and a sequential implementation in sequential-trie.c. Do not use this implemnetation with more than one thread---it will break because it does not use any synchronization!

As before, if you are curious to learn more about tries, the Wikipedia article is a good place to start.

Thread Programming Guidelines

Before you begin the assignment, (re-)read Coding Standards for Programming with Threads. You are required to follow these standards for this project. Because it is impossible to determine the correctness of a multithreaded programming via testing, grading on this project will primarily be based on reading your code not by running tests. Your code must be clear and concise. If your code is not easy to understand, then your grade will be poor, even if the program seems to work. In the real world, unclear multi-threaded code is extremely dangerous -- even if it "works" when you write it, how will the programmer who comes after you debug it, maintain it, or add new features? Feel free to sit down with the TA or instructor during office hours for code inspections before you turn in your project.

Programming multithreaded programs requires extra care and more discipline than programming conventional programs. The reason is that debugging multithreaded programs remains an art rather than a science, despite more than 30 years of research. Generally, avoiding errors is likely to be more effective than debugging them. Over the years a culture has developed with the following guidelines in programming with threads. Adhering to these guidelines will ease the process of producing correct programs:

  1. All threads share heap data. This requires you to use proper synchronization primitives whenever two threads modify or read the shared data. Sometimes, it is obvious how to do so:
              char a[1000]
              void Modify(int m, int n)
              {
              ;      a[m + n] = a[m] + a[n];                        // ignore bound checks
              }
    
    If two thread will be executing this function, there is no guarantee that both will perceive consistent values of the members of the array a. Therefore, such a statment has to be protected by a mutex that ensures the proper execution as follows:
              char a[1000]
              void Modify(int m, int n)
              {
                    Lock(p);
                    a[m + n] = a[m] + a[n];                        // ignore bound checks
                    Unlock(p);
              }
    
    where p is a synchronization variable.
  2. Beware of the hidden data structures! While your own variables on the heap must be protected, it is also necessary to ensure that data structures belonging to the libraries and runtime system also be protected. These data structures are allocated on the heap, but you do not see them. Access to library functions can create situations where two threads may corrupt the data structures because proper synchronization is lacking. For example, two threads calling the memory allocator simultaneously through the malloc() library call might create problems if the data structures of malloc() are not designed for concurrent access. In that case, data structures used to track the free space might be corrupted. The solution is to use a thread-safe version of libc.
    Linking with the -pthread flag (included in your Makefile) is sufficient for libc using gcc. Other compilers may require the -D_POSIX_PTHREAD_SEMANTICS flag to select thread-safe versions.
    Finally, not all libraries support thread-safe functions. If you include extra libraries in your code, it is up to you to figure out whether they are thread safe, or if calls must be protected by a lock.
  3. Simplicity of the code is also an important factor in ensuring correct operation. Complex pointer manipulations may lead to errors and a runaway pointer may start corrupt the stacks of various threads, and therefore manifesting its presence through a set of incomprehensible bugs. Contrived logic, and multiple recursions may cause the stacks to overflow. Modern computer languages such as Java eliminate pointers altogether, and perform the memory allocation and deallocation by automatic memory management and garbage collection techniques. They simplify the process of writing programs in general, and multithreaded programs in particular. Still, without understanding of all the pitfalls that come with programming multithreaded applications, even the most sophistica ted programmer using the most sophisticated language may fall prey to some truly strange bugs.

Getting started with the code

We have provided you with a sequential (single-threaded only) implementation of a reverse trie, and a testing framework. Take a few minutes to read and understand the source files:

main.c Testing framework, options and start-up code. You shouldn't need to modify this file.
trie.h Function definitions for your trie(s).
sequential-trie.c A single-threaded trie implementation. You may copy from this file as a starting point in the exercies below, or write your own trie.

Note that typing make generates four different executables. Currently, only dns-sequential will work, and only with one thread. You will create concurrent variations of the trie, which all share the code in main.c, and interface definition in trie.h.

Exercise 1. (15 points) Implement a thread-safe trie in mutex-trie.c, using coarse-grained locking. In other words, for this exercise, it is sufficient to have one lock for the entire trie. To complete this exercise, we recommend using pthread mutex functions, including pthread_mutex_init, pthread_mutex_lock, and pthread_mutex_unlock.
Be sure to test your code by running with dns-mutex -c XXX (where XXX is an integer greater than 1), in order to check that the code works properly. Because multi-threaded code interleaves memory operations non-deterministically, it may work once and fail a second time---so test very thoroughly and add a lot of assertions to your code.

Testing on a multi-core machine

You will be given access to a machine with more than 1 CPU so that you can test your code. Details to be posted here once the machine is available. We highly recommend you take advantage of this machine, as some bugs may only manifest with multiple CPUs.

Specifically, use your CS user id to ssh to sbrocks.cewit.stonybrook.edu, where you can build and test your code with more than one CPU.

Exercise 2. (15 points) Implement a trie which allows concurrent readers, but mutually excludes writers in rw-trie.c, using coarse-grained locking. In other words, for this exercise, it is still sufficient to have one lock for the entire trie. To complete this exercise, we recommend using pthread mutex functions, including pthread_rwlock_init, pthread_rwlock_rdlock, etc.
Be sure to test your code by running with dns-rw -c XXX (where XXX is an integer greater than 1), in order to check that the code works properly. Because multi-threaded code interleaves memory operations non-deterministically, it may work once and fail a second time---so test very thoroughly and add a lot of assertions to your code.

Squatting

Web sites register a domain name for a period of time---generally in years. Before the name expires, the owner must renew the registraton. If the owner forgets, another user can quickly register the name to a different site, and charge the original owner a fee to buy back their own domain. This practice, as well as the practice of buying domain names on speculation that a business may want them later, is called squatting.

In practice, squatters will monitor the expiration time of a registration, and then race with the owner to renew after the expiration. For our simple simulation, we will do something simpler.

In our current trie implmentations, if someone tries to insert a name that already exists, the insertion simply fails. If a DNS simulation is started with the -q option, this will set a global variable allow_squatting. If squatting is allowed, the behavior of insert should change to block if the name already exists, waking the thread up once the name is delted (we don't expire nodes in our simulation, only explicitly delete them).

You will need to modify each trie implementation, including sequential-trie.c, to support this behavior.

Exercise 3. (30 points) Add squatting support to each of the tries you have implemented so far.
In addition to the normal tests, we have also created squatting stress tests, which are invoked with the -t option. Be sure that this test doesn't simply hang all of your threads.
Note: You do not need to support squatting in the sequential trie. It is ok if all threads block squatting---just be sure that if your implementation hangs, it is because of squatting.

Exercise 4. (40 points) Implement a trie that uses fine-grained locking in fine-trie.c. In other words, every node in the trie should have its own lock (a mutex is fine).
What makes fine-grained locking tricky is ensuring that you cannot deadlock while acquiring locks, and that a thread doesn't hold unnecessary locks. Be sure to document your locking protocol in your README.
Be sure to include support for squatting in this trie as well (extend exercise 3 to the fine-grained trie).

Challenge! (up to 5 points, depending on solution quality) The provided print function is a bit terse, and the user has to work to decode the tree structure. Create a print function that clearly shows the levels of the tree and better conveys the visual intuition of how the tree is organized.

Challenge! (up to 10 points, depending on how elegant and correct your solution is) Each level of the trie is organized as a singly-linked list, sorted by key. Within a level, we could search faster using a skiplist. Read this article (or search the web) to learn more about skip lists.
To complete this challenge, implement a skiplist to replace the current next pointer in each trie node.
Be careful to use the thread-safe pseudo-random number generator.

Challenge! (up to 40 points, depending on how elegant and correct your solution is) Important. This challenge is extremely hard, so do not start this before your lab is otherwise complete. Getting all 40 points will require substantial documentation demonstrating the correctness of your implementation.
Read-copy update is an alternative to reader/writer locking that can be more efficient for read-mostly data structures. The key idea is that readers follow a carefully designed path through the data structure without holding any locks, and writers are careful about how they update the data structure. Writers still use a lock to mutually exclude each other. Read enough from the link above, or other sources, to learn how RCU works.
Task: Create a trie variant (rcu-trie.c) which uses RCU instead of a reader-writer lock. You may use a 3rd party RCU implementation for the helper functions (e.g., rcu_read_lock, or write your own. You should write the rcu-trie yourself. We highly recommend keeping this in a separate source file from your main assignment.
Note: RCU requires memory barriers in some cases, even for readers, so be sure you understand where these need to be placed.

This completes the lab. Make sure you hand in your work with make handin.

Acknowledgements

Portions of the thread programming guidelines are adopted undergraduate OS course guidelines created by Mike Dahlin and Lorenzo Alvisi.


Last updated: Fri May 03 13:45:18 -0400 2013 [validate xhtml]