No Title

Comp 242 Spring 2001

Time Management & Transparent Distributed IPC (Due Tue Mar 5)

In this project, you will add time management and distributed IPC to your Xinu-like kernel.

I have written some sample code that illustrates both the use of sockets and Unix signals/timers. It can be accessed from the Course home page (URL: http://www.cs.unc.edu/ dewan/242/f97/code/assignment3/)

I would recommend implementing time management first and distributed IPC next.

Time Management

The goal of this part of your project is to implement preemptive scheduling and alarms. Like Xinu, your kernel should preempt a process after its time quantum has expired. In addition, you should implement the sleep10 call. You do not have to implement tick rate or deferred processing but do have to make sure that interrupts are blocked during execution of critical regions. You can get clock interrupts from Unix by using the Unix calls setitimer and signal (the SIGVTALRM signal), and you can block interrupts using sigsetmask. The Disable and Enable routines in the sample code illustrate enabling and disabling of signal interrupts.

One problem we face with signals is that there no way to atomically change the signal mask and context switch to a new process. Interrupts have to be disabled during context switching (since shared data are being changed), and the signal mask value has to be restored when the new process starts. So restoring of the new mask and jumping to the start address have to be done atomically. This can be done in Xinu through the rtt instruction, but we do not have such a facility in our OS. Therefore, you should surround the call to ctxsw with Disable and Restore. To handle the case when ctxsw switches to a new process, you have two choices: The elegant solution is to set up the stack of a newly created process to jump to a routine that enables interrupts before jumping to the new process. The easy solution is to require each process to manually call restore as the first statement. You can choose either solution, but do document which one you used.

When an interrupt occurs, registers are saved on the stack of the process currently executing. So be sure to allocate enough space on the stack to service multiple interrupts, otherwise you might get strange core dump errors such as unaligned access.

Distributed IPC

You should extend the semantics of the IPC primitives of the previous assignment to support communication among Xinu threads executing in different Unix processes. The Unix processes represent different Xinu kernels, and thus your IPC essentially allows threads executing on ``distributed'' Xinu kernels to talk to each other. The Xinu kernels may execute on the same or different machines. The distributed IPC you support should be transparent, that is, it should have the same syntax and semantics as the local IPC you have implemented. The only difference should be in performance. You can assume that there will be no failures.

In order to implement transparent IPC, your Xinu kernels need to know the identities and locations of all Xinu kernels in the distributed system. This information will be given as a sequence of arguments to each Unix process representing a Xinu kernel. Each of these arguments consists of a unique kernel id followed by host name. The -l flag before an argument indicates the local kernel while the -r flag indicates a remote kernel. The kernels are assigned consecutive integer ids starting from 1. Thus, the following sequence of commands:

/* executed on jeeves */
xinu -l 1:jeeves -r 2:wooster -r 3:jeeves 
xinu -r 1:jeeves -r 2:wooster -l 3:jeeves

/* executed on wooster */
xinu -r 1:jeeves -l 2:wooster -r 3:jeeves

starts two Xinu kernels, ids 1 and 3, executing the Unix program xinu, on machine jeeves, and one Xinu kernel, id 2, on machine wooster. The order of arguments should not matter. Each Xinu kernel will process these arguments to figure out where its peers are. It is the invoker's responsibility to give consistent values of these arguments.

Since IPC is transparent, user programs do not have to name hosts - they simply name ports, without worrying about the location of the processes sending messages to and receiving messages from these ports. Thus, a single global port name space is used in all kernels - for instance, port 2 means same port on all machines. To make your task easy, you may be tempted to partition the port name space among the various hosts, for instance, ports 1..8 are to be allocated by processes on host jeeves and ports 9..16 by processes on host wooster. Given a port number, you can use the partitioning to figure out which host receives messages on that port.

However, this approach is not allowed since it violates the transparency principle. A process can allocate any unallocated port, and is not restricted to ports in some range.

Your distributed Xinu IPC mechanism will be implemented on top of a distributed Unix IPC mechanism. You are free to use any Unix IPC mechanism such as sockets or Sun RPC that allows non-blocking I/O. To use sockets, you will need to learn several Unix calls: socket, bind, listen, accept, connect, select, read, write, gethostbyname, ntohs, ntohl, htons, and htonl. To learn more about these calls, you can look at the Unix man pages.

Also, as I mentioned in class, never do a read without doing a select before it that confirms that there is data to be read. To use Sun RPC, do a man on callrpc() and svc_run(), which will describe most if not all of the SUN RPC calls. In particular, look at poll(), svc_pollfd(), get_req_poll(), which seem to allow non-blocking IPC, though I have not tested them myself.

You can assume that the maximum number of xinu kernels in the distributed system is bounded - that is, you know this number at program writing time. Also, you can assume that you will always be able to get an unreserved Unix port on a machine.

In writing a distributed program, many things you are not aware of can go wrong. Therefore, your program must check for error return values from all system calls and, if these calls change the errno variable, print out its value in case of error. This should make the debugging task easier.

Issues

You should explain the issues you faced in the design of the implementation and how you resolved them. In particular, you should consider and answer the following questions:

Can you use the Unix sleep call to implement the Xinu sleep10 call?

How did you handle the problem of restoring interrupts when context switch occurs to a newly crated process?

Why not use blocking reads to get information from sockets?

How frequently should you poll for socket input - give the advantages/disadvantages of polling frequently/rarely, and justify the frequency you picked.

Why does partitioning the port name space violate the transparency principle?

How can you make the Xinu IPC calls ( alloc_port, req_port, asend, ssend, mrecv ) efficient - in particular, how can you reduce the number of Unix messages a Xinu IPC call sends? (Hint: think caching or replication of data structures.) It may not be possible to make all Xinu IPC calls efficient in a single implementation, in which case, which ones have you favoured and why? For each of these calls, explain how many Unix messages may be sent.

The answer to the above question depends on whether you use a centralized or replicated implementation. Experience with past classes has convinced me that a replicated implementation is too complex for this course. Therefore, I advise you to choose the simpler centralized implementation.

Instead of enclosing ctxsw within Disable and Restore, you could have ctxsw call Restore before returning. What could go wrong under this approach?

Can disabling ctxsw within Disable and Restore interfere with context switching to a newly created process?

About this document ...

Next: About this document

Prasun Dewan
Tue Feb 12 11:39:09 EST 2002