›› 2016, Vol. 31 ›› Issue (2): 267-283.doi: 10.1007/s11390-016-1626-6

Special Issue: Computer Architecture and Systems; Theory and Algorithms

• Computer Architecture and Systems • Previous Articles     Next Articles

Worst-Case Finish Time Analysis for DAG-Based Applications in the Presence of Transient Faults

Xiao-Tong Cui1,2, Kai-Jie Wu1,2*, Member, CCF, IEEE, Tong-Quan Wei3*, Member, CCF, IEEE, and Edwin Hsing-Mean Sha1,2, Member, CCF, IEEE   

  1. 1 Key Laboratory of Dependable Service Computing in Cyber Physical Society, Chongqing University Chongqing 400044, China;
    2 College of Computer Science, Chongqing University, Chongqing 400044, China;
    3 Department of Computer Science and Technology, East China Normal University, Shanghai 200241, China
  • Received:2015-01-12 Revised:2015-04-21 Online:2016-03-05 Published:2016-03-05
  • Contact: Kai-Jie Wu, Tong-Quan Wei E-mail:kaijie@gmail.com;tqwei@cs.ecnu.edu.cn
  • About author:Xiao-Tong Cui received his B.S. degree in computer science and technology from Chongqing University, Chongqing, in 2013. Currently he is a Ph.D. candidate majoring in computer science and technology of the College of Computer Science, Chongqing University. His current research interests include real-time task scheduling and hardware security.
  • Supported by:

    This work is partially supported by the National High Technology Research and Development 863 Program of China under Grant Nos. 2015AA015304 and 2013AA013202, the National Natural Science Foundation of China under Grant No. 61472052, and Chongqing Research Program under Grant No. cstc2014yykfB40007.

Tasks in hard real-time systems are required to meet preset deadlines, even in the presence of transient faults, and hence the analysis of worst-case finish time (WCFT) must consider the extra time incurred by re-executing tasks that were faulty. Existing solutions can only estimate WCFT and usually result in significant under- or over-estimation. In this work, we conclude that a sufficient and necessary condition of a task set experiencing its WCFT is that its critical task incurs all expected transient faults. A method is presented to identify the critical task and WCFT in O(|V|+|E|) where |V| and |E| are the number of tasks and dependencies between tasks, respectively. This method finds its application in testing the feasibility of directed acyclic graph (DAG) based task sets scheduled in a wide variety of fault-prone multi-processor systems, where the processors could be either homogeneous or heterogeneous, DVS-capable or DVS-incapable, etc. The common practices, which require the same time complexity as the proposed critical-task method, could either underestimate the worst case by up to 25%, or overestimate by 13%. Based on the proposed critical-task method, a simulated-annealing scheduling algorithm is developed to find the energy efficient fault-tolerant schedule for a given DAG task set. Experimental results show that the proposed critical-task method wins over a common practice by up to 40% in terms of energy saving.

[1] Wei T, Mishra P, Wu K, Zhou J. Quasi-static fault-tolerant scheduling schemes for energy-efficient hard real-time systems. J. Systems and Software, 2012, 85(6): 1386-1399.

[2] Liu C L, Layland J W. Scheduling algorithms for multiprogramming in a hard-real-time environment. Journal of the ACM (JACM), 1973, 20(1): 46-61.

[3] Kopetz H, Grunsteidl G. TTP — A protocol for faulttolerant real-time systems. Computer, 1994, 27(1): 14-23.

[4] Chevochot P, Puaut I. Scheduling fault-tolerant distributed hard real-time tasks independently of the replication strategies. In Proc. the 6th Int. Conf. Real-Time Computing Systems and Applications, Dec. 1999, pp.356-363.

[5] Dima C, Girault A, Lavarenne C, Sorel Y. Off-line realtime fault-tolerant scheduling. In Proc. the 9th Euromicro Workshop on Parallel and Distributed Processing, Feb. 2001, pp.410-417.

[6] Girault A, Kalla H, Sighireanu M, Sorel Y. An algorithm for automatically obtaining distributed and fault-tolerant static schedules. In Proc. International Conference on Dependable Systems and Networks, Jun. 2003, pp.159-168.

[7] Pop P, Izosimov V, Eles P, Peng Z. Design optimization of time- and cost-constrained fault-tolerant embedded systems with checkpointing and replication. IEEE Trans. Very Large Scale Integration Systems, 2009, 17(3): 389-402.

[8] Kandasamy N, Hayes J P, Murray B T. Transparent recovery from intermittent faults in time-triggered distributed systems. IEEE Trans. Computers, 2003, 52(2): 113-125.

[9] Olteanu A, Pop F, Dobre C, Cristea V. A dynamic rescheduling algorithm for resource management in large scale dependable distributed systems. Computers & Mathematics with Applications, 2012, 63(9): 1409-1423.

[10] Pop F, Dobre C, Cristea V. Performance analysis of grid DAG scheduling algorithms using MONARC simulation tool. In Proc. the 7th ISPDC, Jul. 2008, pp.131-138.

[11] Pop F, Cristea V. Intelligent strategies for DAG scheduling optimization in grid environments. arXiv Preprint, arXiv: 1106.5303, 2011. http://arxiv.org/ftp/arxiv/papers/ 1106/1106.5303.pdf, August 2015.

[12] Ghosh S, Melhem R, Mosse D. Enhancing real-time schedules to tolerate transient faults. In Proc. the 16th IEEE Real-Time Systems Symposium, Dec. 1995, pp.120-129.

[13] Burns A, Davis R, Punnekkat S. Feasibility analysis of faulttolerant real-time task sets. In Proc. the 8th Euromicro Workshop on Real-Time Systems , Jun. 1996, pp.29-33.

[14] Audsley N, Burns A, Richardson M et al. Applying new scheduling theory to static priority pre-emptive scheduling. Software Engineering Journal, 1993, 8(5): 284-292.

[15] Liberato F, Melhem R, Mossé D. Tolerance to multiple transient faults for aperiodic tasks in hard real-time systems. IEEE Transactions on Computers, 2000, 49(9): 906-914.

[16] Aydin H. Exact fault-sensitive feasibility analysis of realtime tasks. IEEE Trans. Computers, 2007, 56(10): 1372- 1386.

[17] Chrobak M, Hurand M, Sgall J. Fast algorithms for testing fault-tolerance of sequenced jobs with deadlines. In Proc. the 28th IEEE RTSS, Dec. 2007, pp.139-148.

[18] Thekkilakattil A, Dobrin R, Punnekkat S et al. Resource augmentation for fault-tolerance feasibility of real-time tasks under error bursts. In Proc. the 20th Int. Conf. Real- Time and Network Systems, Nov. 2012, pp.41-50.

[19] Goddard S. On the management of latency in the synthesis of real-time signal processing systems from processing graphs [Ph.D. Thesis]. The University of North Carolina at Chapel Hill, 1998.

[20] Liu C, Anderson J H. Supporting soft real-time DAG-based systems on multiprocessors with no utilization loss. In Proc. the 31st IEEE RTSS, Nov. 30-Dec. 3, 2010, pp.3-13.

[21] Bauer G, Kopetz H. Transparent redundancy in the timetriggered architecture. In Proc. International Conference on Dependable Systems and Networks, Jun. 2000, pp.5-13.

[22] Bretz E A. By-wire cars turn the corner. IEEE Spectrum, 2001,38(4): 68-73.

[23] Kopetz H. Why time-triggered architectures will succeed in large hard real-time systems. In Proc. the 5th IEEE FTDCS, Aug. 1995, pp.2-9.

[24] Obermaisser R. Event-Triggered and Time-Triggered Control Paradigms. Springer US, 2004.

[25] Poledna S. Fault-Tolerant Real-Time Systems: The Problem of Replica Determinism. Springer US, 1996.

[26] Suri N, Walter C J, Hugue M M. Advances in ULTRADependable Distributed Systems. Los Alamitos, CA, USA: IEEE Computer Society Press, 1994.

[27] Pop P, Eles P, Peng Z. Schedulability analysis for systems with data and control dependencies. In Proc. the 12th Euromicro Conf. Real-Time Systems, June 2000, pp.201-208.

[28] Laplante P A. Real-Time Systems Design and Analysis. John Wiley & Sons, 2004.

[29] Liu Y, Liang H, Wu K. Scheduling for energy efficiency and fault tolerance in hard real-time systems. In Proc. the DATE, Mar. 2010, pp.1444-1449.

[30] Manber U. Introduction to Algorithms: A Creative Approach. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1989.

[31] Luo J, Jha N K. Static and dynamic variable voltage scheduling algorithms for real-time heterogeneous distributed embedded systems. In Proc. the 15th International Conference on VLSI Design, Jan. 2002, pp.719-726.

[32] Liu Y, Mok A K. An integrated approach for applying dynamic voltage scaling to hard real-time systems. In Proc. the 9th IEEE RTAS, May 2003, pp.116-123.

[33] Cho Y, Chang N, Chakrabarti C, Vrudhula S. High-level power management of embedded systems with applicationspecific energy cost functions. In Proc. the 43rd Annual Design Automation Conference, July 2006, pp.568-573.

[34] Kianzad V, Bhattacharyya S S, Qu G. CASPER: An integrated energy-driven approach for task graph scheduling on distributed embedded systems. In Proc. the 16th IEEE ASAP, Jul. 2005, pp.191-197.

[35] Hua S, Qu G. Power minimization techniques on distributed real-time systems by global and local slack management. In Proc. the 10th ASP-DAC, Jan. 2005, pp.830-835.

[36] Schmitz M T, Al-Hashimi B M, Eles P. Iterative schedule optimization for voltage scalable distributed embedded systems. ACM Trans. Embedded Computing Systems, 2004, 3(1): 182-217.

[37] Lin M, Ding C. Parallel genetic algorithms for DVS scheduling of distributed embedded systems. In Proc. the 3rd HPCC, Sept. 2007, pp.180-191.

[38] Huang J, Buckl C, Raabe A, Knoll A. Energy-aware task allocation for network-on-chip based heterogeneous multiprocessor systems. In Proc. the 19th PDP, Feb. 2011, pp.447- 454.

[39] Hung C M, Chen J J, Kuo T W. Energy-efficient real-time task scheduling for a DVS system with a non-DVS processing element. In Proc. the 27th IEEE International Real- Time Systems Symposium, Dec. 2006, pp.303-312.

[40] Xu R, Melhem R, Mosse D. Energy-aware scheduling for streaming applications on chip multiprocessors. In Proc. the 28th IEEE Int. Real-Time Systems Symp., Dec. 2007, pp.25-38.

[41] ?erný V. Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm. Journal of Optimization Theory and Applications, 1985, 45(1): 41-51.

[42] Kirkpatrick S. Optimization by simulated annealing: Quantitative studies. Journal of Statistical Physics, 1984, 34(5/6): 975-986.

[43] ?ivojnovi? V, Velarde J M, Schläger C, Meyr H. DSPstone: A DSP-oriented benchmarking methodology. In Proc. the ICSPAT, Oct. 1994, pp.715-720.
No related articles found!
Full text



[1] Feng Yulin;. Hierarchical Protocol Analysis by Temporal Logic[J]. , 1988, 3(1): 56 -69 .
[2] Cai Shijie; Zhang Fuyan;. A Fast Algorithm for Polygon Operations[J]. , 1991, 6(1): 91 -96 .
[3] I.V.Vel bitsky; A.L.Kovalev; I.V.Kasatkina; Wang Lei;. R-Technology of Programming: Basic Notions and Implementation[J]. , 1992, 7(4): 345 -355 .
[4] Wang Hui; Liu Dayou; Wang Yafei;. Sequential Back-Propagation[J]. , 1994, 9(3): 252 -260 .
[5] Ying Mingsheng;. Institutions of Variable Truth Values:An Approach in the Ordered Style[J]. , 1995, 10(3): 267 -273 .
[6] WEI Hua; LUO Yupin; YANG Shiyuan;. Fault Tolerance of Reconfigurable Bi-Directional Double-Loop LANs[J]. , 1999, 14(4): 379 -385 .
[7] XU Xiaofei; YE Dan; LI Quanlong; ZHAN Dechen;. Dynamic Organization and Methodology for Agile Virtual Enterprises[J]. , 2000, 15(4): 368 -375 .
[8] Jun Yao, Ji-Wu Shu, and Wei-Min Zheng. Distributed Storage Cluster Design for Remote Mirroring Based on Storage Area Network[J]. , 2007, 22(4): 521 -526 .
[9] Jiang Yu, Andrew Tappenden, James Miller, and Michael Smith. A Scalable Testing Framework for Location-Based Services[J]. , 2009, 24(2): 386 -404 .
[10] Najwa Altwaijry and Mohamed El Bachir Menai. Data Structures in Multi-Objective Evolutionary Algorithms[J]. , 2012, 27(6): 1197 -1210 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved