Scheduling and Locking in
Multiprocessor Real-Time Operating Systems

Björn B. Brandenburg

Dissertation Companion Page

This page collects information relevant to my dissertation, which I prepared under the supervision of Jim Anderson and successfully defended in August 2011. In particular, you can find the source code for the tools used in the experiments, and additional data and graphs not shown in the dissertation.

Awards

Document

Scheduling and Locking in Multiprocessor Real-Time Operating Systems (PDF, 5.6 MiB)

@phdthesis{brandenburg11,
    Author = {Bj\"{o}rn B. Brandenburg},
    School = {The University of North Carolina at Chapel Hill},
    Title =  {Scheduling and Locking in
              Multiprocessor Real-Time Operating Systems},
    Year =   {2011}
}

For a quick overview, you can also download my defense slides (PDF, 17.8 MiB).

Abstract

With the widespread adoption of multicore architectures, multiprocessors are now a standard deployment platform for (soft) real-time applications. This dissertation addresses two questions fundamental to the design of multicore-ready real-time operating systems: (1) Which scheduling policies offer the greatest flexibility in satisfying temporal constraints; and (2) which locking algorithms should be used to avoid unpredictable delays?

With regard to Question 1, LITMUSRT, a real-time extension of the Linux kernel, is presented and its design is discussed in detail. Notably, LITMUSRT implements link-based scheduling, a novel approach to controlling blocking due to non-preemptive sections. Each implemented scheduler (22 configurations in total) is evaluated under consideration of overheads on a 24-core Intel Xeon platform. The experiments show that partitioned earliest-deadline first (EDF) scheduling is generally preferable in a hard real-time setting, whereas global and clustered EDF scheduling are effective in a soft real-time setting.

With regard to Question 2, real-time locking protocols are required to ensure that the maximum delay due to priority inversion can be bounded a priori. Several spinlock- and semaphore-based multiprocessor real-time locking protocols for mutual exclusion (mutex), reader-writer (RW) exclusion, and k-exclusion are proposed and analyzed. A new category of RW locks suited to worst-case analysis, termed phase-fair locks, is proposed and three efficient phase-fair spinlock implementations are provided (one with few atomic operations, one with low space requirements, and one with constant RMR complexity).

Maximum priority-inversion blocking is proposed as a natural complexity measure for semaphore protocols. It is shown that there are two classes of schedulability analysis, namely suspension-oblivious and suspension-aware analysis, that yield two different lower bounds on blocking. Five asymptotically optimal locking protocols are designed and analyzed: a family of mutex, RW, and k-exclusion protocols for global, partitioned, and clustered scheduling that are asymptotically optimal in the suspension-oblivious case, and a mutex protocol for partitioned scheduling that is asymptotically optimal in the suspension-aware case. A LITMUSRT-based empirical evaluation is presented that shows these protocols to be practical.

Source Code

The following projects and tools were developed to support the experiments presented in Chapters 4 and 7. While documentation is sparse at best, the code is reasonably clean and generally well-structured. Each project should be compiled and run under Linux (or, more specifically, LITMUSRT).

While I try to respond to email the best I can, please note that I cannot offer support for any of the software. This is research code; a certain level of skill and inquisitiveness on behalf of the user is expected. Nonetheless, if you feel that something is missing or have specific questions, please feel free to contact me.

The LITMUSRT Kernel

LITMUSRT is a version of the Linux kernel that has been extended to support various multiprocessor real-time scheduling and locking policies. The main LITMUSRT project page can be found at www.litmus-rt.org. Two versions of LITMUSRT were used in the dissertation experiments, one with priority donation and one without.

Two separate versions were used because the prototype implementation of priority donation is somewhat intrusive, which is due to the fact that Linux's design is not well suited to supporting priority donation. (The Linux scheduler is by design reactive, that is, it is not possible to suspend or resume arbitrary tasks while holding scheduler locks. As a workaround, the prototype implementation of priority donation re-implements suspension support within the C-EDF plugin.)

Both versions of the kernel are based on LITMUSRT 2011.1. The regular LITMUSRT kernel is made available as a patch against the underlying Linux version 2.6.36, a second patch against the regular LITMUSRT kernel adds priority donation.

User Space Interface (liblitmus)

The user space API is encapsulated in liblitmus, which is a static library. Additionally, the liblitmus project also contains various helper utilities that are required for switching schedulers, triggering synchronous task system releases, etc. The version of liblitmus that was used in the experiments is based on liblitmus 2011.1 and is made available here as a tarball. See www.litmus-rt.org for further information.

Feather-Trace Tools

Feather-Trace is the tracing toolkit build into LITMUSRT that was used to collect overhead samples (as described in Chapter  4). The ft_tools project contains user space tools to record samples, to post-process traces (to filter and sort by event id), and to convert traces into CSV text files or 32-bit float binary files. The version that was used in the experiments is based on ft_tools 2011.1 and is made available here as a tarball. See www.litmus-rt.org for further information.

Overhead Tracing Workload

Overheads were collected using a tracing framework that automates task setup, tracing setup, task set launching, and overhead recording. This archive also contains the task sets that were executed during overhead tracing.

Overhead Analysis Scripts

Recorded overhead traces were post-processed (shuffled and truncated, to obtain a consistent number of samples) and distilled into models as discussed in Chapters 4. The scripts used to do so are collected in the following tarball.

Spinlock Efficiency Experiments

As discussed in Chapter 7, several spinlock implementations were compared using a micro-benchmark to determine whether queue or ticket locks are preferable on the experimental platform. The relevant code is available in the following archive.

Schedulability Experiments

The schedulability experiments were carried out on UNC's now-retired TOPSAIL cluster. For reference, the source code and overhead models that were used are provided in the below tarball.

Unless noted otherwise, the source code made available on this page should be considered to be licensed under the GPL license. This applies in particular to the kernel patches (which are obviously derived from Linux) and to the user space library liblitmus, which contains code derived from the kernel.

Data

Large amounts of data were generated over the course of the scheduler- and locking-related experiments. The raw overhead samples alone consume half a terabyte of storage. Since the raw data is not very informative by itself, it is not made available for download. Instead, processed overhead statistics, the overhead model used during schedulability experiments, and the schedulability data corresponding to the graphs are available for download. Please contact me if you seek access to the raw data itself.

Overhead Data

The processed overhead data contains statistics for each evaluated task set size and each measured overhead source, including maximum, minimum, median, and average cost, as well as the number of samples used, the standard deviation, and the variance of each experiment. The file format is documented in the accompanying README file.

Overhead Model

The overhead model is a compressed form of the recorded overheads that is used during the schedulability experiments to approximate worst- and average-case overheads for a given task set. See Chapter 4 for an in-depth discussion of the overhead model.

Schedulability Data

During schedulability experiments, both regular schedulability and weighted schedulability scores were collected. For both metrics, values for each scheduler (or locking protocol) range from zero to one, where a value of zero implies that none of the generated task sets were schedulable, and a value of one implies that all task sets were schedulable. Please see Chapter 4 for a precise definition of schedulability and weighted schedulability. The data is stored in self-documenting CSV files.

Graphs

Due to space constraints, only a few, select graphs could be included in the dissertation itself. For the interested reader, all graphs are provided for download.

Scheduler Comparison

As part of the overhead-aware schedulability study presented in Chapter 4, a total of 65,556 graphs were generated. Each of these graphs shows a comparison of a subset of the (in total) 22 evaluated scheduler configurations in terms of schedulability or weighted schedulability score.

Locking Protocol Comparison

For the overhead-aware evaluation of real-time locking protocols presented in Chapter 7, a total of 179,820 schedulability graphs were generated.

Scheduling Overheads

The overhead samples collected under each of the evaluated scheduler configurations were visualized as bar charts, and the maximum and average-case measured overheads were visualized as a function of task count, which yielded 2,781 graphs.