|
›› 2016,Vol. 31 ›› Issue (5): 1038-1052.doi: 10.1007/s11390-016-1678-7
所属专题: Artificial Intelligence and Pattern Recognition
• Special Section on Selected Paper from NPC 2011 • 上一篇 下一篇
De-Qing Zou, Member, CCF, Hao Qin, and Hai Jin, Fellow, CCF, Senior Member, IEEE, Member, ACM
De-Qing Zou, Member, CCF, Hao Qin, and Hai Jin, Fellow, CCF, Senior Member, IEEE, Member, ACM
在现代计算机系统当中,日志一直都是作为获取系统运行情况、诊断系统故障的首要来源。但随着云计算和集群环境的发展,系统和软件的架构变得越来越复杂。各个不同层次间的软件和硬件频繁的交互,导致了系统的高耦合性,也加大了对系统当中出现的故障进行诊断的难度。本文提出了日志综合管理分析系统(UiLog),对整个云环境中各个组件产生的故障日志进行统一的管理,并可以实时分析当前系统的运行状况。当发生故障时,日志综合管理分析系统会判断故障的类型、把故障日志按照故障发生的因果顺序进行排序,辅助管理员进行故障诊断。UiLog首先对系统中的日志进行统一的收集管理,确保了即使系统宕机也可以获得故障信息。UiLog采用了新的故障分类方法,降低了分类过程对于人工知识库的依赖。同时通过故障关键词矩阵,可以有效地对故障日志进行实时分类。此外,UiLog改进了传统的基于时间的故障关联性分析,利用日志故障分类的结果确定不同时间窗口的大小,提高了故障关联性分析的准确率,并可以帮助管理员找到引发故障的根本原因。实验结果显示,日志综合管理分析系统能够全面管理系统中的日志,对日志按照故障类型进行分类,并针对具体故障挖掘出故障产生的根本原因。
[1] Zawoad S, Dutta A K, Hasan R. SecLaaS: Secure logging-asa-service for cloud forensics. In Proc. the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security, May 2013, pp.219-230.[2] Rao X, Wang H, Shi D et al. Identifying faults in large-scale distributed systems by filtering noisy error logs. In Proc. the IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, June 2011, pp.140-145.[3] Yuan D, Mai H, Xiong W et al. SherLog: Error diagnosis by connecting clues from run-time logs. ACM SIGARCH Computer Architecture News, 2010, 38(1): 143-154.[4] Fu Q, Lou J G, Wang Y et al. Execution anomaly detection in distributed systems through unstructured log analysis. In Proc. the 9th IEEE International Conference on Data Mining, December 2009, pp.149-158.[5] Xu W, Huang L, Fox A et al. Detecting large-scale system problems by mining console logs. In Proc. the 22nd ACM Symposium on Operating Systems Principles, October 2009, pp.117-132.[6] Hansen J P, Siewiorek D P. Models for time coalescence in event logs. In Proc. the 22nd IEEE International Symposium on Fault-Tolerant Computing, July 1992, pp.221-227.[7] Mi H B, Wang H M, Zhou Y F et al. Localizing root causes of performance anomalies in cloud computing systems by analyzing request trace logs. Science China (Information Sciences), 2012, 55(12): 2757-2773.[8] Prewett J E, James E. Listening to your cluster with LoGS. In Proc. the 5th LCI International Conference on Linux Clusters: The HPC Revolution, May 2004.[9] Jain S, Singh I, Chandra A et al. Extracting the textual and temporal structure of supercomputing logs. In Proc. the 16th IEEE International Conference on High Performance Computing, December 2009, pp.254-263.[10] Takada T, Koike H. Tudumi: Information visualization system for monitoring and auditing computer logs. In Proc. the International Conference on Information Visualization, July 2002, pp.570-576.[11] Vaarandi R. A data clustering algorithm for mining patterns from event logs. In Proc. the 3rd IEEE Workshop on IP Operations and Management, October 2003, pp.119-126.[12] Bellec J H, Kechadi T M. Cufres: Clustering using fuzzy representative events selection for the fault recognition problem in telecommunication networks. In Proc. the 1st ACM Ph.D. Workshop on Information and Knowledge Management, November 2007, pp.55-62.[13] Ganapathi A, Patterson D. Crash data collection: A windows case study. In Proc. the IEEE/IFIP International Conference on Dependable Systems and Networks, June 28-July 1, 2005, pp.280-285.[14] Pecchia A, Cotroneo D, Kalbarczyk Z et al. Improving logbased field failure data analysis of multi-node computing systems. In Proc. the IEEE/IFIP International Conference on Dependable Systems and Networks, June 2011, pp.97-108.[15] Stearley J, Oliner A J. Bad words: Finding faults in Spirit's syslogs. In Proc. the IEEE International Symposium on Cluster Computing and the Grid, May 2008, pp.765-770.[16] Stearley J. Towards informatic analysis of Syslogs. In Proc. the IEEE International Conference on Cluster Computing, September 2004, pp.309-318.[17] Xu W, Huang L, Fox A et al. Mining console logs for largescale system problem detection. In Proc. the IEEE Conference on Tackling Computer Systems Problems with Machine Learning Techniques, December 2008.[18] Salfner F, Tschirpke S. Error log processing for accurate failure prediction. In Proc. the 1st USENIX Workshop on Analysis of System Logs, December 2008, p.4.[19] Wen X, Zhang X, Zhu Y. Design of fault detection observer based on Hyper Basis Function. Tsinghua Science and Technology, 2015, 20(2): 200-204.[20] Park J, Yoo G, Lee E. Proactive self-healing system based on multi-agent technologies. In Proc. the ACIS International Conference on Software Engineering Research, Management and Applications, August 2005, pp.256-263.[21] Li T, Liang F, Ma S et al. An integrated framework on mining logs files for computing system management. In Proc. the 11th ACM International Conference on Knowledge Discovery in Data Mining, August 2005, pp.776-781.[22] Vaarandi R. A breadth-first algorithm for mining frequent patterns from event logs. In Proc. Intelligence in Communication Systems, November 2004, pp.293-308.[23] Oliner A, Stearley J. What supercomputers say: A study of five system logs. In Proc. the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 2007, pp.575-584.[24] Simache C, Kaâniche M, Saïdane A. Event log based dependability analysis of Windows NT and 2K systems. In Proc. the Pacific Rim International Symposium on Dependable Computing, December 2002, pp.311-315.[25] Ester M, Kriegel H P, Sander J et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. the 2nd ACM International Conference on Knowledge Discovery and Data Mining, August 1996, pp.226-231.[26] Levenshtein V I. Binary codes capable of correcting deletions, insertions and reversals. Journal of Soviet Physics Doklady, 1966, 10: 707-711.[27] Tsao M M, Siewiorek D P. Trend analysis on system error files. In Proc. the IEEE International Symposium on Fault-Tolerant Computing, June 1983, pp.116-119.[28] Fu S, Xu C Z. Exploring event correlation for failure prediction in coalitions of clusters. In Proc. the ACM/IEEE Conference on High Performance Networking and Computing, November 2007. |
No related articles found! |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |