We use cookies to improve your experience with our site.

一种面向分布式流式处理引擎的在线任务管理机制

Online Nonstop Task Management for Storm-Based Distributed Stream Processing Engines

  • 摘要:
    研究背景 大多数分布式流式处理引擎(DSPE)不支持在线任务管理,所以无法适应时变的数据流。最近,一些研究提出在线任务调度算法来解决这一问题。但是,当任务部署方案在运行时发生改变,这些方法无法保证服务质量(QoS)。这是因为改变任务部署方案将导致任务迁移,而在DSPE中进行任务迁移需要高昂的成本。我们研究了目前被广泛使用的一种DSPE—Apache Storm,发现当任务需要迁移时,Storm必须停止部署任务的工作进程。这将导致进程中所有任务的停止和重新启动,从而引起较长时间的系统停机和系统吞吐下降。
    目的 目的 本文的研究目标是设计一种面向DSPE的在线任务管理机制,该机制主要包括两个模块:(1)一种在线任务迁移机制,能够实时迁移目标任务,并且不影响其它已部署任务的运行;(2)一种在线任务调度算法,能够在运行时识别系统的性能瓶颈,生成新的任务部署方案,并使用在线任务迁移机制实时迁移任务。
    方法 在本文中,我们提出N-Storm,它是一种任务-资源解耦的DSPE。N-Storm采用了线程级的任务迁移方案,能够在运行时改变分配给资源的任务。N-Storm在每个物理节点上部署一个本地共享键/值存储,以使资源能够感知任务部署方案的更改。因此,每个资源都能够在运行时管理部署在其上的任务。在N-Storm的基础上,我们进一步提出了在线任务调度算法OTD。传统的任务调度算法一次性部署所有任务,忽略了任务重新部署带来的任务迁移成本,与之不同,OTD能够根据通信成本和资源运行时状态,逐步将当前任务部署调整为优化的任务部署。此外,OTD可以适应不同类型的应用,包括计算密集型和通信密集型应用。
    结果 我们在一个真实的DSPE集群上进行实验。实验结果表明,对于在线任务迁移机制,与Storm和其他最先进的方法相比,N-Storm可以避免系统停机,并减少87%的系统性能下降时间。对于在线任务调度算法OTD,我们运行两种不同类型的应用。对于计算密集型的应用程序,OTD可以将平均CPU使用率提高51%;对于通信密集型的应用程序,OTD可以将网络通信成本降低88%。
    结论 针对数据流的时变特性,提出了一种面向DSPE的在线任务管理机制。首先提出了N-Storm,这是一个支持线程级在线任务迁移的任务-资源解耦DSPE。N-Storm在每个物理节点上部署一个键/值存储,以使资源能够感知任务部署方案的更改,因此,每个资源都能够在运行时管理部署在其上的任务。在N-Storm的基础上,进一步提出了在线任务调度算法OTD。OTD逐步调整当前任务部署,避免任务迁移导致系统性能下降。实验结果表明,N-Storm能够显著减少任务迁移时的性能下降时间,并消除系统停机时间。此外,OTD可以有效提高计算密集型应用程序的平均CPU使用率,并降低通信密集型应用程序的节点间通信代价。

     

    Abstract: Most distributed stream processing engines (DSPEs) do not support online task management and cannot adapt to time-varying data flows. Recently, some studies have proposed online task deployment algorithms to solve this problem. However, these approaches do not guarantee the Quality of Service (QoS) when the task deployment changes at runtime, because the task migrations caused by the change of task deployments will impose an exorbitant cost. We study one of the most popular DSPEs, Apache Storm, and find out that when a task needs to be migrated, Storm has to stop the resource (implemented as a process of Worker in Storm) where the task is deployed. This will lead to the stop and restart of all tasks in the resource, resulting in the poor performance of task migrations. Aiming to solve this problem, in this paper, we propose N-Storm (Nonstop Storm), which is a task-resource decoupling DSPE. N-Storm allows tasks allocated to resources to be changed at runtime, which is implemented by a thread-level scheme for task migrations. Particularly, we add a local shared key/value store on each node to make resources aware of the changes in the allocation plan. Thus, each resource can manage its tasks at runtime. Based on N-Storm, we further propose Online Task Deployment (OTD). Differing from traditional task deployment algorithms that deploy all tasks at once without considering the cost of task migrations caused by a task re-deployment, OTD can gradually adjust the current task deployment to an optimized one based on the communication cost and the runtime states of resources. We demonstrate that OTD can adapt to different kinds of applications including computation- and communication-intensive applications. The experimental results on a real DSPE cluster show that N-Storm can avoid the system stop and save up to 87% of the performance degradation time, compared with Apache Storm and other state-of-the-art approaches. In addition, OTD can increase the average CPU usage by 51% for computation-intensive applications and reduce network communication costs by 88% for communication-intensive applications.

     

/

返回文章
返回