Overload Tolerance in Safety-Critical Systems
The Challenge. A primary objective
in scheduling safety-critical real-time systems is that all deadlines
be met. To achieve this goal, system architects typically attempt to
anticipate every eventuality and design the system to handle all of
these situations. Such a system would, under ideal circumstances,
never miss deadlines and behave as expected by the system designers.
In reality, however, unanticipated emergency conditions may occur
wherein the processing required to handle the emergency exceeds the
system capacity, thereby resulting in missed deadlines. The system is
then said to be in overload. If this
happens, it is important that the performance of the system degrade
gracefully (if at all). A system that panics and suffers a drastic
fall in performance in an emergency is likely to contribute to the
emergency, rather than help solve it. This research project is
investigating approaches to deal with the performance degradation
resulting from transient overloads in time-critical applications.
The Approach. Our approach towards
understanding the behavior of resource allocation algorithms under
overload conditions has focused upon addressing the following issues:
- Proposing appropriate performance metrics
for characterizing systems under overload conditions.
- Identifying the inherent limitations
faced by any scheduling algorithm in the presence of overloaded
conditions.
- Designing overload-tolerant scheduling
algorithms that perform optimally (with respect to the inherent
limitations identified).
- Studying the applicability of various alternative computational paradigms (e.g.,
randomized, parallel) to facilitate the design of overload-tolerant
systems.
Last updated on 2001/11/28 by SkB.