用于生产的增强用户空间和内核跟踪滤波

doi:10.1007/s11390-016-1690-y

摘要: 跟踪工具，如LTTng，相对于传统的调试软件而言，对跟踪软件的影响非常小。但是，从长远来看，在资源受限和高吞吐量环境中，如嵌入式网络交换节点和生产服务器，集体跟踪对目标软件的影响大大增加。其花费不仅仅体现在执行时间上，还包括在离线存储、加工和分析大量的数据方面。为筛选大量的高频事件和记录满足某特定条件的相关事件，本文通过引入基于及时生产（Just-In-Time，JIT）过滤跟踪系统，介绍了一种处理此庞大跟踪数据生成的新方法。用户利用极小的过滤成本可以过滤出大量的事件并且仅将注意力集中在感兴趣的事件上。我们发现在某些特定的情景，及时生产编译过滤器的效率为类似的解释过滤器的三倍。同时，我们还发现随着过滤器谓词和语境变量的增加，一些具有为同类的解释过滤器三倍快的及时生产编译过滤器的优势也会随之增加。基于我们的过滤系统，我们进一步介绍了一个新的构架，它可以促进有效共享数据的内核追踪虚拟器和加工追踪虚拟器之间的协作追踪。我们通过一个跟踪情景论证了它的效用。在此情景中，用户可以通过用户空间跟踪虚拟器，动态地指定系统调用延迟。用户空间跟踪虚拟器的效果可以在它的跟踪决策上得已体现。我们发现在某些特定的情景，JIT编辑过滤器的效率为类似的解释过滤器的三倍。同时，我们还发现随着过滤器谓词和语境变量的增加，JIP编译的效果也得到了增加。我们对比研究了我们共享内存系统上的数据访问效果并且发现在协作跟踪时，我们的系统比传统的数据共享改进了近100倍。

Abstract: Trace tools like LTTng have a very low impact on the traced software as compared with traditional debuggers.However,for long runs,in resource constrained and high throughput environments,such as embedded network switching nodes and production servers,the collective tracing impact on the target software adds up considerably.The overhead is not just in terms of execution time but also in terms of the huge amount of data to be stored,processed and analyzed offline.This paper presents a novel way of dealing with such huge trace data generation by introducing a Just-In-Time (JIT) filter based tracing system,for sieving through the flood of high frequency events,and recording only those that are relevant,when a specific condition is met.With a tiny filtering cost,the user can filter out most events and focus only on the events of interest.We show that in certain scenarios,the JIT compiled filters prove to be three times more effective than similar interpreted filters.We also show that with the increasing number of filter predicates and context variables,the benefits of JIT compilation increase with some JIT compiled filters being even three times faster than their interpreted counterparts.We further present a new architecture,using our filtering system,which can enable co-operative tracing between kernel and process tracing VMs (virtual machines) that share data efficiently.We demonstrate its use through a tracing scenario where the user can dynamically specify syscall latency through the userspace tracing VM whose effect is reflected in tracing decisions made by the kernel tracing VM.We compare the data access performance on our shared memory system and show an almost 100 times improvement over traditional data sharing for co-operative tracing.

用于生产的增强用户空间和内核跟踪滤波

Enhanced Userspace and In-Kernel Trace Filtering for Production Systems