We use cookies to improve your experience with our site.

UiLog:基于日志的故障诊断系统

UiLog: Improving Log-Based Fault Diagnosis by Log Analysis

  • 摘要: 在现代计算机系统当中,日志一直都是作为获取系统运行情况、诊断系统故障的首要来源。但随着云计算和集群环境的发展,系统和软件的架构变得越来越复杂。各个不同层次间的软件和硬件频繁的交互,导致了系统的高耦合性,也加大了对系统当中出现的故障进行诊断的难度。本文提出了日志综合管理分析系统(UiLog),对整个云环境中各个组件产生的故障日志进行统一的管理,并可以实时分析当前系统的运行状况。当发生故障时,日志综合管理分析系统会判断故障的类型、把故障日志按照故障发生的因果顺序进行排序,辅助管理员进行故障诊断。UiLog首先对系统中的日志进行统一的收集管理,确保了即使系统宕机也可以获得故障信息。UiLog采用了新的故障分类方法,降低了分类过程对于人工知识库的依赖。同时通过故障关键词矩阵,可以有效地对故障日志进行实时分类。此外,UiLog改进了传统的基于时间的故障关联性分析,利用日志故障分类的结果确定不同时间窗口的大小,提高了故障关联性分析的准确率,并可以帮助管理员找到引发故障的根本原因。实验结果显示,日志综合管理分析系统能够全面管理系统中的日志,对日志按照故障类型进行分类,并针对具体故障挖掘出故障产生的根本原因。

     

    Abstract: In modern computer systems, system event logs have always been the primary source for checking system statuses. As computer systems become more and more complex, the interaction among software and hardware increases frequently. The components will generate enormous log information, including running reports and fault information. The amount of data is a great challenge for analysis relying on the manual method. In this paper, we implement a management and analysis system of log information, which can assist system administrators to understand the real-time status of the entire system, classify logs into different fault types, and determine the root cause of the faults. In addition, we improve the existing fault correlation analysis method based on the results of system log classification. We apply the system in a cloud computing environment for evaluation. The results show that our system can classify fault logs automatically and effectively. With the proposed system, administrators can easily detect the root cause of faults.

     

/

返回文章
返回