Hot Data Identification with Multiple Bloom Filters: Block-Level Decision vs I/O Request-Level Decision
Dongchul Park1,2, Weiping He3, David H. C. Du4, Fellow, IEEE
1 Division of Computer and Electronic Systems Engineering, Hankuk University of Foreign Studies Gyeonggi-do 17035, Korea;
2 Intel Corporation, Hillsboro, OR 97124, U.S.A;
3 Dell Storage, Eden Prairie, MN 55344, U.S.A;
4 Department of Computer Science and Engineering, University of Minnesota-Twin Cities, Minneapolis, MN 55455, U.S.A
Abstract Hot data identification is crucial for many applications though few investigations have examined the subject. All existing studies focus almost exclusively on frequency. However, effectively identifying hot data requires equally considering recency and frequency. Moreover, previous studies make hot data decisions at the data block level. Such a fine-grained decision fits particularly well for flash-based storage because its random access achieves performance comparable with its sequential access. However, hard disk drives (HDDs) have a significant performance disparity between sequential and random access. Therefore, unlike flash-based storage, exploiting asymmetric HDD access performance requires making a coarse-grained decision. This paper proposes a novel hot data identification scheme adopting multiple bloom filters to efficiently characterize recency as well as frequency. Consequently, it not only consumes 50% less memory and up to 58% less computational overhead, but also lowers false identification rates up to 65% compared with a state-of-the-art scheme. Moreover, we apply the scheme to a next generation HDD technology, i.e., Shingled Magnetic Recording (SMR), to verify its effectiveness. For this, we design a new hot data identification based SMR drive with a coarse-grained decision. The experiments demonstrate the importance and benefits of accurate hot data identification, thereby improving the proposed SMR drive performance by up to 42%.
This work was supported by Hankuk University of Foreign Studies Research Fund of Korea, and also partially supported by the National Science Foundation (NSF) Awards of USA under Grant Nos. 1053533, 1439622, 1217569, 1305237, and 1421913.
About author: Dongchul Park is currently an assistant professor in Division of Computer & Electronic Systems Engineering at Hankuk University of Foreign Studies (HUFS), Gyeonggi-do, South Korea. Before joining HUFS, he was a senior staff research engineer in Storage Technology Group (STG) at Intel, Hillsboro, Oregon, USA in 2017 and a senior research engineer in Memory Solutions Laboratory (MSL) at Samsung Semiconductor Inc. in San Jose, California, USA from 2012 to 2016. He received his Ph.D. degree in computer science and engineering at the University of Minnesota-Twin Cities, Minneapolis, in 2012, and was a member of Center for Research in Intelligent Storage (CRIS) group under the advice of professor David H. C. Du. His research interests focus on storage system design and applications including non-volatile memories, in-storage computing, big data processing, Hadoop MapReduce, data deduplication, key-value store, cloud computing, and shingled magnetic recording (SMR) technology.
Cite this article:
Dongchul Park, Weiping He, David H. C. Du.Hot Data Identification with Multiple Bloom Filters: Block-Level Decision vs I/O Request-Level Decision[J] Journal of Computer Science and Technology, 2018,V33(1): 79-97
 Wang J G, Lo E, Yiu M L, Tong J C, Wang G, Liu X G. The impact of solid state drive on search engine cache management. In Proc. the 36th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, July28-August 1, 2013, pp.693-702. Wang J G, Park D, Kee Y S, Papakonstantinou Y, Swanson S. SSD in-storage computing for list intersection. In Proc. the 12th Int. Workshop on Data Management on New Hardware, June 26-July1, 2016, Article No. 4. Park D, Debnath B, Du D. CFTL:A convertible flash translation layer adaptive to data access patterns. In Proc. the ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems, June 2010, pp.365-366. Park D, Wang J G, Kee Y S. In-storage computing for Hadoop MapReduce framework:Challenges and possibilities. IEEE Trans. Computers, 2016 PP(99). doi:10.1109/TC.2016.2595566 Park D, Debnath B, Du D H C. A dynamic switching flash translation layer based on page-level mapping. IEICE Trans. Information and Systems, 2016, E99-D(6):1502-1511 Gray J. Tape is dead disk is tape flash is disk RAM locality is king. December 2006. http://signallake.com/innovation/FlashisGood.pdf, Dec. 2017. Martin J. Is tiered storage obsolete? Yes and no! November 2013. https://www.computerworld.com/article/2474599/data-center/is-tiered-storage-obsolete-yes-andno-.html, January 2018. Tagawa I, Williams M. High density data-storage using shingled-write. In Proc. the IEEE Int. Magnetics Conf., March 2009. Kasiraj P, New R M H, De Souza J C, Williams M L. System and method for writing data to dedicated bands of a hard disk drive:US 7490212. http://www.freepatentsonline.com/7490212.html, Dec. 2017. Gibson G, Polte M. Directions for shingled-write and two dimensional magnetic recording system architectures:Synergies with solid-state disks. Carnegie Mellon University Parallel Data Lab Technical Report, CMU-PDL-09-104, 2009. http://www.doc88.com/p-1866949816678.html, Dec. 2017. HGST. HGST delivers world's first 10TB enterprise HDD for active archive applications. June 2015. http://www.hgst.com/company/media-room/press-releases/HGSTDelivers-Worlds-First-10TB-Enterprise-HDD-for-Active-Archive-Applications, Dec. 2017. SMR. Seagate, breaking capacity barriers with seagate shingled magnetic recording. Aug. 2013. http://www.seagate.com/tech-insights/breaking-areal-density-barriers-with-seagate-smr-master-ti/, Dec. 2017. Sanvido M, Bandic Z, Cassuto Y, De Souza J, Guyot C, Harayama T. Distributed field self-test for shingled magnetic recording drives:US 8599507, http://www.freepatentsonline.com/8599507.html, Dec. 2017. Chang L P, Kuo T W. Efficient management for large-scale FlashMemory storage systems with resource conservation. ACM Trans. Storage, 2005, 1(4):381-418. Debnath B, Subramanya S, Du D, Lilja D J. Large Block CLOCK (LB-CLOCK):A write caching algorithm for solid state disks. In Proc. IEEE Int. Symp. Modeling Analysis & Simulation of Computer and Telecommunication Systems, September 2009. Kim H, Ahn S. BPLRU:A buffer management scheme for improving random writes in flash storage. In Proc. the 6th USENIX Conf. File and Storage Technologies, February 2008, Article No. 16. Levandoski J J, Larson P Å, Stoica R. Identifying hot and cold data in main-memory databases. In Proc. the 29th IEEE Int. Conf. Data Engineering, April 2013, pp.26-37. Chang Y H, Hsieh J W, Kuo T W. Endurance enhancement of FlashMemory storage systems:An efficient static wear leveling design. In Proc. the 44th ACM/IEEE Design Automation Conf., June 2007, pp.212-217. Boboila S, Desnoyers P. Write endurance in flash drives:Measurements and analysis. In Proc. the 8th USENIX Conf. File and Storage Technologies, February 2010. Soundararajan G, Prabhakaran V, Balakrishnan M, Wobber T. Extending SSD lifetimes with disk-based write caches. In Proc. the 8th USENIX Conf. File and Storage Technologies, February 2010. Nath S, Kansal A. FlashDB:Dynamic self-tuning database for NAND flash. In Proc. the 6th Int. Symp. Information Processing in Sensor Networks, April 2007, pp.410-419. Sun G Y, Joo Y, Chen Y B, Niu D M, Xie Y, Chen Y R, Li H. A hybrid solid-state storage architecture for the performance, energy consumption and lifetime improvement. In Proc. the 16th Int. Symp. High Performance Computer Architecture, January 2010, Chang L P. Hybrid solid-state disks:Combining heterogeneous NAND flash in large SSDs. In Proc. the Asia and South Pacific Design Automation Conf., March 2008, pp.428-433. Lin C I, Park D, He W P, Du D H C. H-SWD:Incorporating hot data identification into shingled write disks. In Proc. the 20th Int. Symp. Modeling Analysis and Simulation of Computer and Telecommunication Systems, August 2012, pp.321-330. Jones S N, Amer A, Miller E L, Long D D E, Pitchumani R, Strong C R. Classifying data to reduce long term data movement in shingled write disks. In Proc. the 31st Symp. Mass Storage Systems and Technologies, May 2015. Chiang M L, Lee P C H, Chang R C. Managing flash memory in personal communication devices. In Proc. IEEE International Symp. Consumer Electronics, December 1997, pp.177-182. Chang L P, Kuo T W. An adaptive striping architecture for flash memory storage systems of embedded systems. In Proc. IEEE Real-Time and Embedded Technology and Applications Symp., September 2002, pp.187-196. Hsieh J W, Kuo T W, Chang L P. Efficient identification of hot data for flash memory storage systems. ACM Trans. Storage, 2006, 2(1):22-40. Park D, Du D H C. Hot data identification for flash-based storage systems using multiple bloom filters. In Proc. the 27th IEEE Symp. Mass Storage Systems and Technologies, May 2011. Bloom B H. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 1970, 13(7):422-426. Dharmapurikar S, Krishnamurthy P, Taylor D E. Longest prefix matching using bloom filters. IEEE/ACM Trans. Networking, 2006, 14(2):397-409. Kryder M H, Gage E C, McDaniel T W, Challener W A, Rottmayer R E, Ju G P, Hsia Y T, Erden M F. Heat assisted magnetic recording. Proceedings of the IEEE, 2008, 96(11):1810-1835. Challener W A, Peng C, Itagi A V, Karns D, Peng Y, Yang X, Zhu X, Gokemeijer N J, Hsia Y T, Ju G, Rottmayer R E, Seigler M A, Gage E C. The road to HAMR. In Proc. Asia-Pacific Magnetic Recording Conf., January 2009. Rottmayer R E, Batra S, Buechel D, Challener W A, Hohlfeld J, Kubota Y, Li L, Lu B, Mihalcea C, Mountfield K, Pelhos K, Peng C, Rausch T, Seigler M A, Weller D, Yang X M. Heat-assisted magnetic recording. IEEE Trans. Magnetics, 2006, 42(10):2417-2421. Dobisz E A, Bandic Z Z, Wu T W, Albrecht T. Patterned media:Nanofabrication challenges of future disk drives. Proceedings of the IEEE, 2008, 96(11):1836-1846. Kikitsu A, Kamata Y, Sakurai M, Naito K. Recent progress of patterned media. IEEE Trans. Magnetics, 2007, 43(9):3685-3688. Zhang S H, Chai K S, Cai K, Chen B J, Qin Z L, Foo S M. Write failure analysis for bit-patterned-media recording and its impact on read channel modeling. IEEE Trans. Magnetics, 2010, 46(6):1363-1365. Amer A, Holliday J, Long D D E, Miller E L, Paris J F, Schwarz T. Data management and layout for shingled magnetic recording. IEEE Trans. Magnetics, 2011, 47(10):3691-3697. Greaves S, Kanai Y, Muraoka H. Shingled recording for 2-3 Tbit/in2. IEEE Trans. Magnetics, 2009, 45(10):3823-3829. Amer A, Long D D E, Miller E L, Paris J F, Schwarz S J T. Design issues for a shingled write disk system. In Proc. the 26th Symp. Mass Storage Systems and Technologies, May 2010. Cassuto Y, Sanvido M A A, Guyot C, Hall D R, Bandic Z Z. Indirection systems for shingled-recording disk drives. In Proc. the 26th Symp. Mass Storage Systems and Technologies, May 2010. Feldman T. Host-aware SMR. Nov. 2014. http://openzfs.org/w/images/2/2a/Host-AwareSMR-TimFeldman.pdf, Dec. 2017. Feldman T, Gibson G. Shingled magnetic recording:Areal density increase requires new data management. The Magazine of USENIX & SAGE, 2013, 38(3):22-30. Aghayev A, Desnoyers P. Skylight:A window on shingled disk operation. In Proc. the 13th USENIX Conf. File and Storage Technologies, February 2015, pp.135-149. Wu F G, Yang M C, Fan Z Q, Zhang B Q, Ge X Z, Du D. Evaluating host aware SMR drives. In Proc. the 8th USENIX Workshop on Hot Topics in Storage and File Systems, June 2016, pp.31-35. Narayanan D, Donnelly A, Rowstron A. Write off-loading:Practical power management for enterprise storage. In Proc. the 6th USENIX Conf. File and Storage Technologies, February 2008, pp.253-267. Russinovich M. DiskMon for Windows v2.01, 2006. https://technet.microsoft.com/enus/sysinternals/diskmon.aspx, Jan. 2018.
Copyright 2010 by Journal of Computer Science and Technology