One problem we face with signals is that there no way to atomically change the signal mask and context switch to a new process. Interrupts have to be disabled during context switching (since shared data are being changed), and the signal mask value has to be restored when the new process starts. So restoring of the new mask and jumping to the start address have to be done atomically. This can be done in Xinu through the rtt instruction, but we do not have such a facility in our OS. Therefore, you should surround the call to ctxsw with Disable and Restore. To handle the case when ctxsw switches to a new process, you have two choices: The elegant solution is to set up the stack of a newly created process to jump to a routine that enables interrupts before jumping to the new process. The easy solution is to require each process to manually call restore as the first statement. You can choose either solution, but do document which one you used.
When an interrupt occurs, registers are saved on the stack of the process
currently executing.
So be sure to allocate enough space on the stack to service multiple interrupts,
otherwise you might get strange core dump errors such as unaligned access.
Distributed IPC
You should extend the semantics of the IPC primitives of the previous
assignment to support communication among Xinu threads
executing in different Unix processes.
The Unix processes
represent different Xinu kernels,
and thus your IPC essentially allows threads executing on
``distributed'' Xinu kernels to talk to each other.
The Xinu kernels may execute on the same or different machines.
The distributed IPC you support should be transparent,
that is,
it should have the same syntax and semantics
as the local IPC you have
implemented.
The only difference should be in performance.
You can assume that there will be no failures.
In order to implement transparent IPC, your Xinu kernels need to know the identities and locations of all Xinu kernels in the distributed system. This information will be given as a sequence of arguments to each Unix process representing a Xinu kernel. Each of these arguments consists of a unique kernel id followed by host name. The -l flag before an argument indicates the local kernel while the -r flag indicates a remote kernel. The kernels are assigned consecutive integer ids starting from 1. Thus, the following sequence of commands:
/* executed on jeeves */ xinu -l 1:jeeves -r 2:wooster -r 3:jeeves xinu -r 1:jeeves -r 2:wooster -l 3:jeeves /* executed on wooster */ xinu -r 1:jeeves -l 2:wooster -r 3:jeevesstarts two Xinu kernels, ids 1 and 3, executing the Unix program xinu, on machine jeeves, and one Xinu kernel, id 2, on machine wooster. The order of arguments should not matter. Each Xinu kernel will process these arguments to figure out where its peers are. It is the invoker's responsibility to give consistent values of these arguments.
Since IPC is transparent, user programs do not have to name hosts - they simply name ports, without worrying about the location of the processes sending messages to and receiving messages from these ports. Thus, a single global port name space is used in all kernels - for instance, port 2 means same port on all machines. To make your task easy, you may be tempted to partition the port name space among the various hosts, for instance, ports 1..8 are to be allocated by processes on host jeeves and ports 9..16 by processes on host wooster. Given a port number, you can use the partitioning to figure out which host receives messages on that port.
However,
this approach is not allowed since it violates the transparency principle.
A process can allocate any unallocated port,
and is not restricted to ports in some range.
Unix Calls
Your distributed Xinu IPC mechanism will be implemented on top of
a distributed Unix IPC mechanism.
You are free to use any Unix IPC mechanism such as sockets or Sun RPC that
allows non-blocking I/O.
To use sockets,
you will need to learn several Unix calls:
socket,
bind,
listen,
accept,
connect,
select,
read,
write,
gethostbyname,
ntohs,
ntohl,
htons,
and
htonl.
To learn more about these calls,
you can look at the Unix man pages.
Also,
as I mentioned in class,
never do a read without
doing a select before it that confirms that there is data to be read.
To use Sun RPC, do a man on callrpc() and svc_run(), which will describe most
if not all of the SUN RPC calls.
In particular, look at poll(), svc_pollfd(), get_req_poll(), which seem to
allow non-blocking IPC, though I have not tested them myself.
To do time management and poll for messages, you should look at the man pages for:
signal,
siggetmask,
sigsetmask,
sigmask,
and
setitimer.
I have written some sample code that illustrates both the use of sockets and Unix signals/timers. It can be accessed from the Course home page (URL: http://www.cs.unc.edu/ dewan/242/f97/code/assignment3/) The TA for the spring 06 version of the course, Stephen Zota, tweaked some of this code to improve it. The modified code is at: http://www.cs.unc.edu/ dewan/242/s07/code/zota_sockets. On the client side, his code uses inet_ntop and Inet_pton. These functions are recommended by Stevens and they solve elegantly the host address problems. On the server side, when programming with sockets in Linux, he ran into issues with SIGPIPE. His code includes a a simple handler to ignore this signal.
You can assume that the maximum number of xinu kernels in the distributed system is bounded - that is, you know this number at program writing time. Also, you can assume that you will always be able to get an unreserved Unix port on a machine.
In writing a distributed program,
many things you are not aware of can go wrong.
Therefore,
your program must check for error return values from all system calls
and, if these calls change the
errno
variable,
print out its value in case of error.
This should make the debugging task easier.
Issues
You should explain the issues you faced in the design of the implementation
and how you resolved them.
In particular,
you should consider and answer the following questions:
Can you use the Unix sleep call to implement the Xinu sleep10 call?
How did you handle the problem of restoring interrupts when context switch occurs to a newly crated process?
Instead of enclosing ctxsw within Disable and Restore, you could have ctxsw call Restore before returning. Can anything go wrong under this approach?
Why not use blocking reads to get information from sockets?
How frequently should you poll for socket input - give the advantages/disadvantages of polling frequently/rarely, and justify the frequency you picked?
Why does partitioning the port name space violate the transparency principle?
How can you make the Xinu IPC calls ( alloc_port, req_port, asend, ssend, mrecv ) efficient - in particular, how can you reduce the number of Unix messages a Xinu IPC call sends? (Hint: think caching or replication of data structures.) It may not be possible to make all Xinu IPC calls efficient in a single implementation, in which case, which ones have you favoured and why? For each of these calls, explain how many Unix messages may be sent.
The answer to the above question depends on whether you use a centralized or
replicated implementation.
In a replicated architecture, the existence of a well known central process cannot be assumed,
that is, it is a pure peer to peer implementation.
Experience with past classes has convinced me that a
replicated implementation is too complex for this course. Therefore, I advise you to
choose the simpler centralized implementation (which may have caching).