Cong Liu

Ph.D. Candidate
Department of Computer Science
The University of North Carolina at Chapel Hill

Sitterson Hall 147, UNC-CH,
Chapel Hill, NC, 27514
                         Office phone: (919) 962-1924




Overview: Multicore Real-Time Embedded Cyber-Physical Systems

Embedded systems are computer systems designed for specific control functions within larger systems, often with real-time constraints (e.g., deadline requirements). Real-time embedded systems are pervasive often without being noticed. Such systems range from portable devices such as smartphones, to large stationary installations like traffic lights, to systems controlling nuclear power plants. The demand for embedded systems is expected to expand rapidly in the near future. According to the International Data Corporation, by 2015, more than 4 billion embedded devices requiring 14.5 billion microprocessor cores will be shipped, creating $2 trillion in revenue. Much of this growth is being propelled by the development of more sophisticated cyber-physical systems (CPS) such as autonomous vehicle and smart power grid, which often integrate and coordinate multiple networked computational and physical elements. Since many CPS often require significant computational power, uniprocessor chips are no longer adequate. Instead, with the recent advent of multicore technologies, multicore-based designs are being advocated for CPS; such platforms are capable of offering greater computational capacity with less energy consumption and cost.

Overview

The primary goal of my research is to establish principles and mechanisms for efficiently building multicore real-time embedded CPS. Many application-specific CPS may far exceed today's systems in several aspects such as complexity and scale. They are expected to deliver higher performance in a more intelligent way while consuming less energy with a minimum number of hardware components. For example, an unmanned arial vehicle (UAV) is expected to autonomously and accurately make flying instructions in a real-time fashion for accomplishing mission-critical military-related tasks, but at the same time may have stringent space and energy constraints. In order to expedite and accelerate the realization of future CPS in a wide range of application domains, my research focuses on efficiently supporting several common characteristics that often arise in the design and implementation of many CPS, including dependency, energy efficiency, and scalability. Below I briefly summerize the contribution and impact of my research in each aspect.


Dependency

Many CPS have multiple dependent networked components, each implementing a specific functionality. Moreover, CPS are expected to actively interact with the environment, human beings, and other external devices. These properties often result in complex inter- and/or intra-task dependencies. For example, consider two common tasks in driving an autonomous vehicle: environmental sensing and navigation control. These two tasks are dependent: the navigation control task can execute only after receiving needed data from the environmental sensing task. Unfortunately, existing research on multicore real-time embedded systems has been mostly limited to systems supporting simple tasks for which dependencies do not exist; this limits the practical applicability of such research. In practice, dependencies are currently dealt with by over-provisioning systems, which is an economically wasteful practice. With the rising complexities of networked real-time embedded systems, over-provisioning has led to an increasingly unmanageable proliferation of such systems, to the effect that some modern cars contain over one hundred processors.

My research is thus aimed at bridging this gap between practice and theory in the design of CPS. Specifically, the goal is to enable sophisticated multicore-based CPS that contain common types of dependencies to be efficiently designed and built. By deriving predictable multicore real-time embedded system design, analysis, and implementation methods, my research seeks to avoid over-provisioning systems and to reduce the number of needed hardware components to the extent possible while providing timing correctness guarantees. With each removed component, the overall system complexity as well as cost and error rate can be reduced quadratically. Moreover, the size, weight, and power (S.W.a.P.) can be dramatically decreased since every component requires wiring and cooling, adds weight, requires space, and drains power. Motivated by these observations, my research specifically addresses the problem of efficiently supporting the following four common types of dependencies:

Overview Overview

For the past many years, the general problem of supporting real-time embedded systems containing common types of dependencies is known to be notoriously hard and remains largely unsovled. The impact of my work is demonstrated by the fact that it provides the first set of practically efficient solutions that can fundamentally solve this problem in an efficient way. For example, I closed the problem of supporting graph-based dependencies in multicore real-time embedded systems (which stood open for 12 years) by proposing an optimal solution [RTSS10] that does not require the system to be over-provisioned to any extend. (This work has also resulted in a research proposal and secured funding from the U.S. Air Force.) All my proposed solutions are both theorectically tractable and practically efficient, as demonstrated by rigorous analytical analysis and extensive experiments via implementations.

Related Publications:

[RTSS12] An O(m) Analysis Technique for Supporting Real-Time Self-Suspending Task Systems [pdf]
Cong Liu and James H. Anderson, Proceedings of the 33th IEEE Real-Time Systems Symposium, to appear, 2012.

[RTCSA12 Best Papers] Supporting Soft Real-Time Parallel Applications on Multicore Processors [pdf]
Cong Liu and James H. Anderson, Proceedings of the 18th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2012.

[RTSS10] Supporting Soft Real-Time DAG-based Systems on Multiprocessors with No Utilization Loss [pdf]
Cong Liu and James H. Anderson, Proceedings of the 31st IEEE Real-Time Systems Symposium, pp. 3-13, 2010.

[RTAS10] Scheduling Suspendable, Pipelined Tasks with Non-Preemptive Sections in Soft Real-Time Multiprocessor Systems [pdf]
Cong Liu and James H. Anderson, Proceedings of the 16th IEEE Real-Time and Embedded Technology and Applications Symposium, pp. 23-32, 2012.

[RTSS09 Best Student Paper Award] Task Scheduling with Self-Suspensions in Soft Real-Time Multiprocessor Systems [pdf]
Cong Liu and James H. Anderson, Proceedings of the 30th IEEE Real-Time Systems Symposium, pp. 425-436, 2009.

[ECRTS09] Supporting Pipelines in Soft Real-Time Multiprocessor Systems [pdf]
Cong Liu and James H. Anderson, Proceedings of the 21th Euromicro Conference on Real-Time Systems, pp. 269-278, 2009.

Multiprocessor Schedulability Analysis for Self-Suspending Task Systems
Cong Liu and James H. Anderson, in submission, 2012.


Energy Efficiency

Overview

Reducing power and energy consumption has become a pressing issue in a wide range of computing systems, ranging from isolated mobile embedded devices to large-scale CPS. I designed and implemented a set of energy-efficient computation and data mapping algorithms on different system platforms, including CPU/GPU heterogeneous multiprocessors [PACT12], distributed systems [ICCCN08], and data clusters [IPCCC08]. These algorithms are able to effectively reduce power and energy consumption, often by a significant margin, without sacrificing performance (e.g., response time performance). For example, my proposed power-aware mapping techniques [PACT12] for CPU/GPU heterogeneous system are able to meet applications' timing requirements while reducing power and energy consumption by applying DVFS on both CPUs and GPUs. I have implemented the proposed techniques in a real CPU/GPU heterogeneous system. Experimental results with several data analytics workloads show that compared to performance-driven mapping, our power-efficient mapping techniques can often achieve a reduction of more than 20% in power and energy consumption. I also studied how to improve energy efficiency in distributed systems. One interesting finding is that although data duplications may be able to improve the performance of data-intensive applications on data clusters, a large number of data replicas inevitably increase energy dissipation in storage resources. In order to implement a data cluster with high energy efficiency, I propose to reduce the amount of data replications and transfers to the maximum extend possible without degrading the performance for running data-intensive workloads in clusters. Experimental results based on a real-world trace demonstrate that with respect to energy consumption, my proposed approach conserves over 35% more energy than existing approaches.

Related Publications:

[PACT12] Power-Efficient Time-Sensitive Mapping in CPU/GPU Heterogeneous Systems [pdf]
Cong Liu, Jian Li, Wei Huang, Juan Rubio, Evan Speight, and Xiaozhu Lin, Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques , pp. 23-32, 2012.

[IPCCC08] Distributed Energy Efficient Scheduling for Real-Time Data-intensive Applications [pdf]
Cong Liu, Xiao Qin, Santosh Kulkarni, Chengjun Wang, Adam Manzanares, Shuang Li, and Sanjeev Baskiyar,Proceedings of the 27th IEEE International Performance Computing and Communications Conference, pp. 26-33, 2008.

[ICCCN08] PASS: Power-Aware Scheduling of Mixed Applications with Deadline Constraints on Clusters [pdf]
Cong Liu, Xiao Qin, and Shuang Li, Proceedings of the 17th IEEE International Conference on Computer Communications and Networks , 2008.


Scalability

Overview

Many application-specific CPS such as smart power grid are expected to contain a large number of computing and physical elements. Software methods deployed in such systems must be scalable. Motivated by this, I propose a set of distributed workload scheduling and dispatching algorithms [JPDC09, ICPADs08, HiPC07] that can handle workloads with mixed types of timing requirements in a heterogeneous grid. To the best of my knowledge, the proposed approaches are the first attempt at prioritizing workloads according to their types as well as considering deadlines and dispatch times. Workloads are allocated and dispatched to both appropriate computational and link (bandwidth) resources. Furthermore, they are highly scalable as they do not require a full knowledge of the state of all nodes of the grid as many other algorithms do. Due to a peer to peer dispatcher, knowledge of peer site capacities is sufficient. One must consider that obtaining full knowledge of the state of the grid is difficult and/or temporally intensive. Results of exhaustive simulation demonstrate that the proposed algorithms are often able to outperform existing approaches, sometimes by a significant margin.

Related Publications:

[JPDC09] . A General Distributed Scalable Grid Scheduler for Independent Tasks [pdf]
Cong Liu and Sanjeev Baskiyar, Journal of Parallel and Distributed Computing, vol. 69, no. 3, pp. 307-314, 2009

[ICPADS08] Scheduling Mixed Tasks with Deadlines in Grids using Bin Packing [pdf]
Cong Liu and and Sanjeev Baskiyar,Proceedings of the 14th IEEE International Conference on Parallel and Distributed Systems, pp. 229-236, 2008.

[HiPC07] A General Distributed Scalable Peer to Peer Scheduler for Mixed Tasks in Grids [pdf]
Cong Liu, Sanjeev Baskiyar, and Shuang Li, Proceedings of the 14th IEEE International Conference on High Performance Computing, pp. 320-330, 2007.


Ongoing Research


Scalable Communication and Scheduling for Many-Core Systems

ongoing A recent multicore-related development is the emergence of many-core architectures, where hundreds of processors are placed on the same chip. Many-core architectures are different from traditional multicore architectures because they rely on message passing mechanisms instead of shared memory for inter-core communication to ensure better scalability. This different inter-core communication method brings new challenges to support real-time graph-based applications on many-core platforms, because delays due to inter-core communications may depend upon the locality of processors where each part of the application graph is allocated. Thus, instead of scheduling only computations, we now have to consider data locality and judiciously schedule computations and communications at the same time. When application graphs are assigned to processing cores, the communication requirements as well as the means of communication should be considered to obviate excessive overheads associated with communication. For example, several emerging many-core processors include a router mesh that facilitates communication between cores via message passing. Tasks can be assigned to processors so as to reduce contention on the router mesh, and therefore to reduce delays due to contention from other tasks' communication. I am currently investigating this problem by designing an efficient horizontal dissemination-based mapping algorithm to allocate real-time application graphs to processors in a way so as to reduce communication delays on the underlying router mesh of the Intel Single-Chip Cloud Computer (SCC). Preliminary experimental results also demonstrate the effectiveness of the proposed algorithm.

Besides reducing communication overheads, another research challenge raised in supporting real-time applications on many-core platforms is to control temperature while providing timing correctness guarantees. As the number of cores on a many-core platform increases dramatically, systems are prone to overheating. High temperature not only reduces reliability and increases timing errors, but may also ultimately cause physical damage and failure on the chip. Given its importance, thermal management has become a prominent issue in system design. Techniques for thermal management have been explored both at the hardware layer via appropriate packaging and active heat dissipation mechanisms, and at the software layer via different dynamic thermal management techniques. I propose to design thermal-aware task mapping and data routing algorithms for supporting real-time PGM graphs on many-core platforms, which can maintain safe temperature levels or minimize the peak temperature for processors without violating timing constraints.


Supporting data-intensive real-world workloads in networked systems.

ongoing With the rapidly growing popularity of Web 2.0 technologies, and social business, an increasing number of applications are emerging that require large volumes of data and devote most of their processing time to analysis and manipulation of data. Many organizations have started to use data-intensive cluster computing systems (e.g., Hadoop) for applications such as Google's MapReduce. An important requirement of executing such data-intensive applications is to satisfy real-time constraints. For example, in time-sensitive processes such as detecting fraud, millions of daily call detail records must be analyzed in real time in order to predict customer churn faster. When such data-intensive workloads access disks, they need to compete for disk resources. The problem of cooperative real-time scheduling of CPUs and disks is known to be hard and still remains as open. A couple of heuristic algorithms have been proposed but are unable to provide analytically provable timing correctness guarantees because they rely on heuristics and are purely evaluated based upon run-time performance. Therefore, I would like to investigate this open problem by proposing cooperative scheduling algorithms and associated schedulability tests that can guarantee timing correctness. This problem is challenging because when a task needs to access both CPUs and disks, its execution phases on these two resources may interfere with each other, and may be interfered by other tasks on both resources. I am currently developing techniques that can eliminate or alleviate the negative effects due to such interferences. Preliminary analysis and experimental results demonstrate the potential effectiveness of the proposed techniques.

Furthermore, emerging data-intensive analytics and ``Big Data'' workloads pose new challenges to system design and configuration. Current systems are often designed and configured in response to more traditional workload demands. To execute such new workloads more efficiently, one promising improvement to traditional system design is the addition of reconfigurable acceleration. Accelerators such as FPGAs are more efficient for executing highly data-parallel functions such as matrix manipulations. While reconfigurable acceleration has been studied and used successfully in several commercial systems, the problem of meeting certain performance constraints such as power budget and response time requirement that commonly exist in many systems has received limited attention. I'd like to address the research challenges brought by considering such constraints in the design and configuration of accelerator-based systems. For example, given a certain power budget, how to configure the system (i.e., which and how many power-efficient processing accelerators should be used) and how to offload performance-intensive functions to the reconfigurable logic in a way such that the system throughput can be maximized.


MISC

Overview

I have also had the opportunity to work on projects where we design several routing metrics and protocols for wireless sensor networks and wireless multimedia sensor networks. Wireless sensor networks are distributed event-based systems with severe energy constraints, variable quality links, low data-rate and many-to-one event-to-sink flows. Communication algorithms for sensor networks, such as directed diffusion, are designed to operate efficiently under these constraints. However, directed diffusion is not efficient in more challenging domains, such as video sensor networks, because of the high throughput and low delay requirements of multimedia data. Instead, we propose new routing algorithms based on directed diffusion that reinforces routes with high link quality and low latency, thus maximizing throughput and minimizing delay. Simulation results with CBR (constant bit rate) traffic show that our proposed distributed algorithm selects routes that give better throughput than those reinforced by standard directed diffusion, while maintaining low delay.

Moreover, Real-time multimedia transport has stringent QoS requirements, such as bandwidth, delay, jitter, and loss ratio. Wireless sensor networks are useful for streaming multimedia data in infrastructure-free and hazardous environments. However, these networks are composed of nodes with constrained bandwidth and energy. We propose a multipath algorithm based on directed diffusion that reinforces multiple routes with high link quality and low latency. We use the NS-2 simulation tool with video trace generated by Multiple Description Coding (MDC) to evaluate the performance. The results show that our algorithm gives better throughput and delay performance than standard directed diffusion.

Related Publications:

[IJWMN11] Improving QoS-based Routing by Limiting Interference in Lossy Wireless Sensor Network [pdf]
Shuang Li, Cong Liu and Alvin Lim, International Journal of Wireless & Mobile Networks, 2011.

[IJWMN10] Efficient Multi-Path Protocol for Wireless Sensor Networks [pdf]
Shuang Li, Raghu Neelisetti, Cong Liu and Alvin Lim,International Journal of Wireless & Mobile Networks, 2010.

[WOWMOM08] Delay-Constrained Highest Throughput Protocol for Multi-Path Transmission over Wireless Multimedia Sensor Networks [pdf]
Shuang Li, Raghu Neelisetti, Cong Liu, and Alvin Lim, Proceedings of the 9th IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks, pp. 1-8, 2008.