next up previous
Next: Scheduling Up: Multiprocessors Previous: Multiprocessors

Kinds of Processes

A multiprocessor system can execute application processes simultaneously. What kind of processes should it support? One simple approach is to use a Unix or Xinu like approach of supporting one kind of process in the system, which are scheduled by the OS. In the Xinu case, it would be a lwp while in the Unix case it would be a hwp. The kernel would simply now schedule as many processes as there are processors.

A problem with this approach is that any practical system, unlike Xinu, would support hwps. If we support only one kind of processes, then the cost of context switching the hwps would be high (requires entering the kernel, changing page tables). Thus, this approach does not encourage fine-grained parallelism because of the high cost of concurrency.

Therefore the solution is to support both lwps and hwps. For a process to be truly a lwp, not only must it not require loading of translation tables, but it must also not require kernel code to do the context switching, because of the high cost of entering the kernel, which not only involves the cost of the kernel trap but also copying and checking of parameters. Threads supported by the kernel have been found to be ten times as costly to schedule as threads supported by user-level code. (The cost of scheduling of the latter has been found to be within an order of magnitude of the cost of a procedure call.) Kernel-scheduled threads also incure the space cost of a kernel stack per thread. Another issue is fairness - the more threads an application has, the more of the CPU time it gets. Therefore, the most efficient solution seems to be to support lwps in user code within a hwp, as in Java or 730-xinu.

The problem with this solution of course is that user-level code does not have the rights to bind processes to processors and thus cannot schedule lwps on multiple processors. The kernel has the rights to do so, but it does not know about the lwps, since they are implemented entirely in user-level code. So we need a more elaborate scheme to give us fine-grained concurrency at low cost. One solution is to have a mixed scheduling scheme for lwps wherein both the kernel and user-level code work together to schedule threads. This is what happens with p-threads in the BSD OS - the thread data structures (context blocks) are known to both user-level and kernel code. When a p-thread makes a blocking call, the kernel invokes a callback in the user-level scheduler informing it that a blocking call has been made. The scheduler can then schedule some other thread and reschedule the original thread when it gets another notification saying that the blocking call has completed. As a result, a system blocking call by a thread does not block the entire application. We can imagine similar cooperation between the kernel and and user-level scheduler to allow threads to be scheduled on multiple processors. This solution is intended to retain compatibility with existing operatint systems in which address spaces and processes are coupled together.

To understand how we might support thread concurrency in a more fundamental way, we must ask ourselves what the role of hwps is in a mixed system that has both lwps and hwps. The real computation is done is done by lwps - hwps simply schedule them, and are essentially virtual processors that are scheduled on the real processors. In Java and 730-xinu, the threads of an application are bound to a single virtual processor. Thus, all of these threads share a single physical processor, as the kernel does not understand threads. The key to solve this problem is to schedule the threads on multiple virtual processors, which can then be bound to muyltiple physical processors by the OS.

This solution is described in Mcann et al. It requires is three kinds of entities. One, applications or jobs, which like Unixs hwp define address spaces but unlike Unix do not define threads. These are known to the kernel. Second, virtual processors, which are known to the kernel, and are executed in the context (address space) of some application. Each application creates a certain number of vps based on the concurrency of the application (that is, the number of threads that can be active simultaneously) and other factors we shall see later. As far as the kernel is considered, these are the units of scheduling. It divides the physical processors among the virtual processors created by the different applications. Third, threads or tasks, which are lwps supported in user-level code and scheduled by the virtual processors. Like a Xinu kernel, a virtual processor schedules lwps: the difference is that multiple virtual processors share a common pool of lwps. Thus, an application thread is not bound to a virtual processor - All ready application threads are queued in a ready queue serviced by the multiple virtual processors.

Now we have a scheme that provides the benefit we wanted. Fine- grained concurrency is provided in user-level code by a virtual processor, which switches the threads. Large-grained concurrency is provided by kernel-level code, which switches among the virtual processors. The net results is that multiple threads of an application can be executing at the same time (on different virtual processors) and the cost of switching among threads is low!


next up previous
Next: Scheduling Up: Multiprocessors Previous: Multiprocessors
Prasun Dewan 2007-04-19