Kernel Support for Implementing the 2-Level Process Model

Next: About this document ... Up: Multiprocessors Previous: Short Jobs

Kernel Support for Implementing the 2-Level Process Model

The threads executed by virtual processors are much like Xinu processes in that they share a single virtual address space. An important difference is that the Xinu processes are context switched by the kernel. Therefore, they are more like the processes implemented in your project, which run in user-space. The user-level threads were meant to be simulations of the real kernel supported threads so you dont have to work on the bare machine. As we see in the dicussion above, they are useful in the presence of kernel supported (heavyweight or lightweight) processes. Switching among kernel supported threads (lwps) requires trapping to the kernel and copying and checking of parameters. Switching among hwps also requires switching of address spaces. Moreover, operations executed by kernel supported hwps and lwps such as resume(), signal() and wait() require trapping to kernel and checking and copying of parameters by the kernel for security reasons. In the case of user-level threads, these costs are not incurred. Anderson et al did experiments in 1991 to quantify these results. They measured the cost of (a) forking a process that executes a null procedure, and (b) executing a signal followed immediately by a wait. In the case of user-level threads, kernel-supported lwps, and hwps, the costs were (34, 37), (948, 441) and (11300, 1840) micro seconds, respectively.

Performance is not the only reason for supporting user-level threads. Different applications can use different types of user-level threads. For example, a real-time application can use threads with priorities, while a matrix multiplier can use threads withour priorities, which we have seen are simpler to implement and more efficient as they dont involve maintaining and checking a priority queue. On the other hand, kernel threads are shared by all applications, and thus must satisfy the requirements of all applications. The result is that an application such as the mnatrix multiplier that has simple needs must pay the cost of features it does not need.

On the other hand, user-level threads raise some conceptual and implementation issues. As we saw above, one of them what happens when a user-level thread performs a blocking I/O call. In the project implementation, such a call blocks the whole hwp (virtual processor) blocks. One "solution" to this problem is to use asynchronous calls instead of synchronous ones. However, as we have seen before, such calls are harder to program. Moreover, if such calls return results, then a mechanism must be provided to get the results. If the kernel does not know about the thread, how does it deliver the results to it? One solution has been to poll for results of asynchronous calls, making the thread make the call.

Another issue, which we saw above, is what happens when a virtual processor executing a user-level thread is preempted. This may happen, in space scheduling when a new application enters the system. Ideally, we would like to schedule the thread on some other virtual processor. But if the kernel does not know about the thread, it cannot do so. Process control and equiparition ensure that a VP cannot be preempted if it is not at a safe point, but we saw this has negative consequences. In process control, the number of vps can exceed the number of processors, while in equipartition, the new application must wait for a processor. Dynamic addresses this problem by preempting a currently running vp, ideally one not in a critical section. However, this means the preempter must know if a process is in the critical section. Moreover, as mentioned above, the thread executed by the vp does not get a chance to execute until its vp is reassigned to the application.

Yet another issue is how a vp is preempted, assigned to a processor, and resumed. In process control, a scheduling server kept track of the scheduling policy, calculating quotas. A vp suspended itself to preempt itself by waiting on a signal. The root vp of an application took the responsibility of resuming the vp by sending the signal. This policy does not work for dynamic, which requires immediate forced preemption.

One integrated way to address all of these problems is to provide kernel support for user-level threads. This may seem like a contradiction in terms: how can a thread be user-level if it is supported by the kernel? The idea is that a user-level thread is known to both a user-level manager and the kernel, and its context block is kept in memory shared between the user-level manager and kernel. Some operations are implemented by the manager and some by the kernel. Moreover, the manager informs the kernel of some of the operations it implements, and vice versa - the user-level manager is informed about some of the operations implemented by the kernel. This is a variation of the coordination we discussed in space scheduling - the coordination happens between the kernel and user-level manager rather than vps and the global scheduler. It is also a variation of the idea of a global user-level scheduler communicating with the kernel to bind a vp to a physical processor. As long as the kernel operations are executed rarely and the communication between the kernel and user-level threads happens rarely, we can have the benefits of user-level threads without the drawbacks.

We would, of course, like to have as few operations as possible executed by the kernel. Operations that block on I/O, however, must involve the kernel, as it is the kernel that processes the I/O. If the kernel knows about the user-level thread that executed the call, it can inform the user-level manager that the thread is blocked, allowing it to make use of the processor on which the thread was executing for another thread. This is particualry important when the associated application has only one vp allocated, as in the project. However, some coordination mechanism must be used to inform the user-level manager that the vp can be reassigned. Similarly, when the I/O operation completes, some upcall mechanism must be found to inform the manager that the thread that executed it is now ready to be scheduled. Similarly, when a vp is preempted, some upcall mechanism must be provided to inform the user-level manager so it can schedule the thread bound to the vp on another vp. This idea used in an operating system called Psyche invented at U. of Rochester.

So far, we have assumed that virtual processors of an application are simply kernel threads sharing an address space. In other words, as far as the kernel is cocncerned, they are simply kernel threads. If the kernel knows that these are virtual processors, it can provide many optimizations and features to solve the problems mentioned above. In particualr, it can provide a simple and uniform mechanism to notify the user-level manager about kernel events.

To understand the nature of this mechanism, consider the difference between general kernel threads and vps. They is no need to share a processor among different vps as long as space scheduling is used, with one vp assigned to each processor. In other words, as round robin scheduling is provided at the user-thread level, there is no need for it at the kernel level as long as it is doing space multiplexing. Moreover, the only kernel-level blocking call made by a vp is an I/O call. Synchronization, IPC, and suspend/resume operations apply to the user-level threads and not vps. Furthermore, the I/O call a vp makes is on behalf of a user-level thread. When the call completes, the vp can be destroyed as long as the user-level manager can be informed about the readiness of the user-level thread and the thread can be assigned to another vp. Finally, a regular (kernel or user-level) thread is explicitly invoked by a user-program to perform some user-task. A vp, on the other hand, is simply a vehicle for executing a user-level thread. As a result, it can be created automatically by the kernel rather than the applciation program. This works if the kernel is the one determining how many processors are assigned to an application.

These properties have been used to design a system developed by Anderson et al, in which vps are called scheduler activations. A scheduler activation is like a kernel thread in that it has both a kernel stack and user-level stack, used in the kernel and user mode, respectively. Moreover, it is associated with a context block and an id, called an activation id. It is called an activation rather than a thread/process because once it blocks, it resumes only to complete the operation for which it blocked. The activation is created by the kernel rather than application program. At any point, the number of kernel activations is equal to the number of processors in the system. These activations, of course, are shared among the various applications. Like a Xinu thread, an activation invokes a function asynchronously. As the activation is created by the kernel rather than the application, it upcalls callback procedures at well known locations in user-space. (Invocation of a callback procedure in a client by a system component/server is called an upcall in systems research.) These procedure do not directly execute application tasks. Instead, they executes code doing user-level thread management.

When an application is created, at least one virtual processor must be assigned to it. Therefore, the kernal creates an activation/virtual processor for it and invokes the following callback, processorAllocated (processorNumber) in the application. This procedure keeps track of how whether it has been invoked before or not. As this is its first invokation, it binds the main application (user-level) thread to the activation. This thread can subsequently ask the (user-level) thread manager to create additional threads. In response to such requests, the manager creates the threads, puts them in the ready queue, and invokes the addMoreProcessors (numberOfProcessors) downcall in the kernel to request new processors to execute the ready threads.

The kernel notes the processor need of the application, and based on its Equipartition quota, assigns some number of processors (<= numberOfProcessors) to the application. For each of these processors, it invokes the addThisProcessor(processorNumber) upcall in the application, which can then execute the next thread in the ready queue. The processor number is passed to the upcall so that the thread manager can use it to reschedule a thread executing on that processor (because its time quantum has expired or some higher prority thread has become ready).

So now we have some number of threads executing on processors and some waiting the ready queue. The thread manager can then use a kernel-supported timer to context switching among the threads using some scheuduling policy of its choice. When a threads performs synchronization, ipc, and other kinds of blocking calls supported by the thread manager, the manager can block it, and assign the associated activation to some other ready thread. If such a thread does not exist, it can declare to the kernel that it is willing to yield the associated processor by invoking the thisProcessorIsIdle() downcall in the kernel. The processor number need not be passed to the kernel as the kernel keeps track of the processor on which an activation was created. The kernel now assigns the processor to some application's whose processor need has not been met and has enough credit (as defined by the dynamic or some other policy), and invokes the processorAllocated(processorNumber) in the application as before.

Let us consider now an I/O blocking call made by a thread. Such a call is processed not by the thread manager but the kernel. When it is made, the kernel unbinds it from the processor on which it was executing. To reassign this processor to the application, it creates a new scheduler activation and binds it to the processor. At this point the application needs to know that: (a) The thread has blocked on an I/O call and thus is not elibile to run. (b) A new activation is available on the processor on which the thread was running. The kernel does not directly know the thread that was blocked, because a thread is a user-level abstraction. On the other hand, it knows the activation the thread was running. Let us assume that the thread manager can determine the association between activations and processors by invoking the call, getActivationNumber(processorNumber). (The paper by Anderson et al does not directly mention this call.) It knows the association between processor numbers and threads. Therefore, the kernel provides the thread manager with the two pieces of information by making the new activation bound to the processor invoke the following upcall: activationHasBeenBlocked (blockedActivationNumber). The thread manager can now change the state of the blocked thread to blockedOnI/O and schedule some other thread on the activation.

When the I/O of the operation of the blocked thread completes, the thread may need to perform further steps of the operation in kernel mode. After that, the thread needs to resume execution in the user mode, that is, return from the resched operation. The additional steps are executed by the original or a new activation assigned to a new processor (the paper is silent on this issue), and this activation can now execute the upcall activationHasUnblocked(unblockedActivationNumber, register state) in user-code. The thread manager can now store this state in the context block of the blocked thread and put it in the ready queue. It can then use the new activation to schedule the next thread. If there are no idle processors, the kernel preempts some processor (of the same or different application). It informs the application by invoking the upcall processorHasBeenPreempted (preemptedActivationNumber, machineState) as part of the old or new activation. The thread manager now knows that the processor attached to the thread is no longer available, and puts the thread in the ready queue after storing its register state. This call is also used when the quota of an application is to be reduced because, for instance, and new application has entered the system.

Next: About this document ... Up: Multiprocessors Previous: Short Jobs

Prasun Dewan 2007-04-19