Journal of Computer Science and Technology

   

Cognition: Accurate and Consistent Linear Log Parsing using Template Correction

Ran Tian1,2, Zulong Diao2,4, Haiyang Jiang2, Gaogang Xie1,3, Senior Member, CCF, Member, ACM, Senior Member, IEEE   

  1. 1University of Chinese Academy of Sciences, Beijing 100049, China
    2Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
    3Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    4Purple Mountain Laboratories, Nanjing 211111, China

Logs contain runtime information for both systems and users. As many of them use natural language, a typical log-based analysis needs to parse logs into structured format first. Existing parsing approaches often take two steps. The first step is to find similar words (tokens) or sentences. Secondly, parsers extract log templates by replacing different tokens with variable placeholders. However, we observe that most parsers concentrate on precisely grouping similar tokens or logs. But they do not have a well-designed template extraction process, which leads to inconsistent accuracy on particular datasets. The root cause is the ambiguous definition of variable placeholder and similar templates. The consequences include abuse of variable placeholders, incorrectly divided templates, and an excessive number of templates over time. In this paper, we propose our online log parsing approach Cognition. It redefines variable placeholder via a strict lower bound to avoid ambiguity first. Then, it applies our template correction technique to merge and absorb similar templates. It eliminates the interference of commonly used parameters and thus isolates template quantity. Evaluation through 16 public datasets shows that Cognition has better accuracy and consistency than the state-of-the-art approaches. It also saves up to 52.1% of time cost on average than the others.


中文摘要

研究背景:
日志是记录系统或设备运行状态的工具。通常情况下,一条日志的内容是由一个基于自然语言的模板及在运行时的具体参数组合而成的。其中,模板通常标识这某一特定的系统事件,而参数则表明在这特定事件中具体的系统状态。此外,系统也会为每条日志添加少量固有信息(如“时间戳”)。如今,许多基于日志的分析处理方法或工具需要通过独立分析系统事件或时间参数从而得出分析结果。因此,如何准确稳定快速的将日志内的模板和各个参数分别解析出来,成为顺利运行这些分析处理方法或工具时所面临的首要问题。
目的:
我们的研究致力于通过解析一定数量的、同一系统生成的日志条目,生成该系统的各个日志模板,并对照生成的模板,解析该系统的各条日志所使用的模板及其插入的具体参数。
方法:
我们提出了一种基于模板修正的线性日志方法“Cognition”。该方法首先重新给出了参数占位符的精确定义。接着,通过对比相似日志条目,我们可得出粗略的日志模板。之后,基于新的占位符定义,我们通过对相似的日志模板进行不断的修正,从而得出准确的日志模板。然后,我们使用精确模板解析出日志所含的各项参数。最后,我们使用已有的公开数据集对“Cognition”方法进行了全面的评估。
结果:
对比现有先进的其他日志解析方法,在对相同的数据集解析时,我们的“Cognition”方法拥有更高的准确性。并且改变数据集对准确性的影响也要远低于其他日志解析方法。同时,相比于部分其他方法,“Cognition”最多节省52.1%的时间开销。
结论:
实验证明,基于精确参数占位符定义,使用模板修正的日志解析方法,能够有效且稳定的提升日志解析的准确性,并同时能在一定程度上减少时间开销。


Key words: log analysis, log parsing, template correction


;

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Feng Yulin;. Recursive Implementation of VLSI Circuits[J]. , 1986, 1(2): 72 -82 .
[3] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[4] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[5] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[6] Jin Lan; Yang Yuanyuan;. A Modified Version of Chordal Ring[J]. , 1986, 1(3): 15 -32 .
[7] Qu Yanwen;. AGDL: A Definition Language for Attribute Grammars[J]. , 1986, 1(3): 80 -91 .
[8] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[9] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[10] Lu Xuemiao;. On the Complexity of Induction of Structural Descriptions[J]. , 1987, 2(1): 12 -21 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved