“Cognition”——一种基于模板修正准确稳定的线性日志解析方法
Cognition: Accurate and Consistent Linear Log Parsing Using Template Correction
-
摘要:研究背景 日志是记录系统或设备运行状态的工具。通常情况下,一条日志的内容是由一个基于自然语言的模板及在运行时的具体参数组合而成的。其中,模板通常标识这某一特定的系统事件,而参数则表明在这特定事件中具体的系统状态。此外,系统也会为每条日志添加少量固有信息(如“时间戳”)。如今,许多基于日志的分析处理方法或工具需要通过独立分析系统事件或时间参数从而得出分析结果。因此,如何准确稳定快速的将日志内的模板和各个参数分别解析出来,成为顺利运行这些分析处理方法或工具时所面临的首要问题。目的 我们的研究致力于通过解析一定数量的、同一系统生成的日志条目,生成该系统的各个日志模板,并对照生成的模板,解析该系统的各条日志所使用的模板及其插入的具体参数。方法 我们提出了一种基于模板修正的线性日志方法“Cognition”。该方法首先重新给出了参数占位符的精确定义。接着,通过对比相似日志条目,我们可得出粗略的日志模板。之后,基于新的占位符定义,我们通过对相似的日志模板进行不断的修正,从而得出准确的日志模板。然后,我们使用精确模板解析出日志所含的各项参数。最后,我们使用已有的公开数据集对“Cognition”方法进行了全面的评估。结果 对比现有先进的其他日志解析方法,在对相同的数据集解析时,我们的“Cognition”方法拥有更高的准确性。并且改变数据集对准确性的影响也要远低于其他日志解析方法。同时,相比于部分其他方法,“Cognition”最多节省52.1%的时间开销。结论 实验证明,基于精确参数占位符定义,使用模板修正的日志解析方法,能够有效且稳定的提升日志解析的准确性,并同时能在一定程度上减少时间开销。Abstract: Logs contain runtime information for both systems and users. As many of them use natural language, a typical log-based analysis needs to parse logs into the structured format first. Existing parsing approaches often take two steps. The first step is to find similar words (tokens) or sentences. Second, parsers extract log templates by replacing different tokens with variable placeholders. However, we observe that most parsers concentrate on precisely grouping similar tokens or logs. But they do not have a well-designed template extraction process, which leads to inconsistent accuracy on particular datasets. The root cause is the ambiguous definition of variable placeholders and similar templates. The consequences include abuse of variable placeholders, incorrectly divided templates, and an excessive number of templates over time. In this paper, we propose our online log parsing approach Cognition. It redefines variable placeholders via a strict lower bound to avoid ambiguity first. Then, it applies our template correction technique to merge and absorb similar templates. It eliminates the interference of commonly used parameters and thus isolates template quantity. Evaluation through 16 public datasets shows that Cognition has better accuracy and consistency than the state-of-the-art approaches. It also saves up to 52.1% of time cost on average than the others.