? 多布隆过滤器热数据识别:区块级别决策vs I/O请求级别决策
Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | Help
 
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
Journal of Computer Science and Technology 2018, Vol. 33 Issue (1) :79-97    DOI: 10.1007/s11390-018-1809-4
Computer Architecture and Systems << Previous Articles | Next Articles >>
多布隆过滤器热数据识别:区块级别决策vs I/O请求级别决策
Dongchul Park1,2, Weiping He3, David H. C. Du4, Fellow, IEEE
1 Division of Computer and Electronic Systems Engineering, Hankuk University of Foreign Studies Gyeonggi-do 17035, Korea;
2 Intel Corporation, Hillsboro, OR 97124, U.S.A;
3 Dell Storage, Eden Prairie, MN 55344, U.S.A;
4 Department of Computer Science and Engineering, University of Minnesota-Twin Cities, Minneapolis, MN 55455, U.S.A
Hot Data Identification with Multiple Bloom Filters: Block-Level Decision vs I/O Request-Level Decision
Dongchul Park1,2, Weiping He3, David H. C. Du4, Fellow, IEEE
1 Division of Computer and Electronic Systems Engineering, Hankuk University of Foreign Studies Gyeonggi-do 17035, Korea;
2 Intel Corporation, Hillsboro, OR 97124, U.S.A;
3 Dell Storage, Eden Prairie, MN 55344, U.S.A;
4 Department of Computer Science and Engineering, University of Minnesota-Twin Cities, Minneapolis, MN 55455, U.S.A

摘要
参考文献
相关文章
Download: [PDF 1207KB]  
摘要 热数据识别对于很多应来说至关重要,但目前这方面的研究并不多。几乎所有的现存研究都把重点放在了频率上,然而,有效地识别热数据需要同时考虑新近和频率。而且,以往的研究均在数据块层级上作热数据决策。因为闪存存储的随机访问的性能与其顺序访问的性能一样优秀,如此细粒度的决策特别适合闪存存储。但是,硬盘驱动器(HDD)在顺序访问和随机访问的性能有显著差异,因此,与闪存存储不同,HDD不对称的访问性能使得需要粗粒度的决策。本文提出了一个新的热数据识别方案,它利用多布隆过滤器有效地描述新近和频率。因此,它不仅少消耗了50%的内存和高达58%的计算花销,而且与现有代表性方案相比,它降低了高达65%的错误识别率。此外,我们将此方案应用到下一代HDD技术,叠瓦式磁破纪录(SMR),以验证它的效用,。为此,我们设计了一个全新的基于SMR驱动的粗粒度决策的热数据识别方法。实验揭示了准确的热数据识别的重要性和好处;它为本文提出的SMR驱动性能提升高达42%。
关键词热数据   布隆过滤器   叠瓦式磁破纪录(SMR)     
Abstract: Hot data identification is crucial for many applications though few investigations have examined the subject. All existing studies focus almost exclusively on frequency. However, effectively identifying hot data requires equally considering recency and frequency. Moreover, previous studies make hot data decisions at the data block level. Such a fine-grained decision fits particularly well for flash-based storage because its random access achieves performance comparable with its sequential access. However, hard disk drives (HDDs) have a significant performance disparity between sequential and random access. Therefore, unlike flash-based storage, exploiting asymmetric HDD access performance requires making a coarse-grained decision. This paper proposes a novel hot data identification scheme adopting multiple bloom filters to efficiently characterize recency as well as frequency. Consequently, it not only consumes 50% less memory and up to 58% less computational overhead, but also lowers false identification rates up to 65% compared with a state-of-the-art scheme. Moreover, we apply the scheme to a next generation HDD technology, i.e., Shingled Magnetic Recording (SMR), to verify its effectiveness. For this, we design a new hot data identification based SMR drive with a coarse-grained decision. The experiments demonstrate the importance and benefits of accurate hot data identification, thereby improving the proposed SMR drive performance by up to 42%.
Keywordshot data   bloom filter   shingled magnetic recording (SMR)     
Received 2016-10-06;
本文基金:

This work was supported by Hankuk University of Foreign Studies Research Fund of Korea, and also partially supported by the National Science Foundation (NSF) Awards of USA under Grant Nos. 1053533, 1439622, 1217569, 1305237, and 1421913.

About author: Dongchul Park is currently an assistant professor in Division of Computer & Electronic Systems Engineering at Hankuk University of Foreign Studies (HUFS), Gyeonggi-do, South Korea. Before joining HUFS, he was a senior staff research engineer in Storage Technology Group (STG) at Intel, Hillsboro, Oregon, USA in 2017 and a senior research engineer in Memory Solutions Laboratory (MSL) at Samsung Semiconductor Inc. in San Jose, California, USA from 2012 to 2016. He received his Ph.D. degree in computer science and engineering at the University of Minnesota-Twin Cities, Minneapolis, in 2012, and was a member of Center for Research in Intelligent Storage (CRIS) group under the advice of professor David H. C. Du. His research interests focus on storage system design and applications including non-volatile memories, in-storage computing, big data processing, Hadoop MapReduce, data deduplication, key-value store, cloud computing, and shingled magnetic recording (SMR) technology.
引用本文:   
Dongchul Park, Weiping He, David H. C. Du.多布隆过滤器热数据识别:区块级别决策vs I/O请求级别决策[J]  Journal of Computer Science and Technology , 2018,V33(1): 79-97
Dongchul Park, Weiping He, David H. C. Du.Hot Data Identification with Multiple Bloom Filters: Block-Level Decision vs I/O Request-Level Decision[J]  Journal of Computer Science and Technology, 2018,V33(1): 79-97
链接本文:  
http://jcst.ict.ac.cn:8080/jcst/CN/10.1007/s11390-018-1809-4
Copyright 2010 by Journal of Computer Science and Technology