计算机科学技术学报 ›› 2017,Vol. 32 ›› Issue (6): 1231-1249.doi: 10.1007/s11390-017-1797-9

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

云计算环境下有效地查询分组策略

Qin Liu1,2, Member, CCF, Yuhong Guo3, Jie Wu4, Fellow, IEEE, Guojun Wang5,*, Distinguished Member, CCF, Member, ACM, IEEE   

  1. 1 College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China;
    2 State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications Beijing 100876, China;
    3 School of Computer Science, Carleton University, Ottawa, ON K155 B6, Canada;
    4 Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, U.S.A.;
    5 School of Computer Science and Educational Software, Guangzhou University, Guangzhou 510006, China
  • 收稿日期:2016-05-18 修回日期:2017-01-12 出版日期:2017-11-05 发布日期:2017-11-05
  • 通讯作者: Guojun Wang E-mail:csgjwang@gmail.com
  • 作者简介:Qin Liu received her B.S.degree in 2004 from Hunan Normal University,Changsha,and her M.S.degree in 2007 and Ph.D.degree in 2012 both from Central South University,Changsha,all in computer science.She was a visiting student at Temple University,Philadelphia.Her research interests include security and privacy issues in cloud computing.
  • 基金资助:

    This research was supported in part by the National Science Foundation of USA under Grant Nos. CNS-1449860, CNS-1461932, CNS-460971, CNS-1439672, CNS-1301774, and ECCS-1231461, the National Natural Science Foundation of China under Grant Nos. 61632009, 61472451, 61402161, 61472131, 61272151, and 61272546, the Hunan Provincial Natural Science Foundation of China under Grant No. 2015JJ3046, and the Open Foundation of State Key Laboratory of Networking and Switching Technology (Beijing University of Posts and Telecommunications) under Grant No. SKLNST-2016-2-20.

Effective Query Grouping Strategy in Clouds

Qin Liu1,2, Member, CCF, Yuhong Guo3, Jie Wu4, Fellow, IEEE, Guojun Wang5,*, Distinguished Member, CCF, Member, ACM, IEEE   

  1. 1 College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China;
    2 State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications Beijing 100876, China;
    3 School of Computer Science, Carleton University, Ottawa, ON K155 B6, Canada;
    4 Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, U.S.A.;
    5 School of Computer Science and Educational Software, Guangzhou University, Guangzhou 510006, China
  • Received:2016-05-18 Revised:2017-01-12 Online:2017-11-05 Published:2017-11-05
  • Contact: Guojun Wang E-mail:csgjwang@gmail.com
  • About author:Qin Liu received her B.S.degree in 2004 from Hunan Normal University,Changsha,and her M.S.degree in 2007 and Ph.D.degree in 2012 both from Central South University,Changsha,all in computer science.She was a visiting student at Temple University,Philadelphia.Her research interests include security and privacy issues in cloud computing.
  • Supported by:

    This research was supported in part by the National Science Foundation of USA under Grant Nos. CNS-1449860, CNS-1461932, CNS-460971, CNS-1439672, CNS-1301774, and ECCS-1231461, the National Natural Science Foundation of China under Grant Nos. 61632009, 61472451, 61402161, 61472131, 61272151, and 61272546, the Hunan Provincial Natural Science Foundation of China under Grant No. 2015JJ3046, and the Open Foundation of State Key Laboratory of Networking and Switching Technology (Beijing University of Posts and Telecommunications) under Grant No. SKLNST-2016-2-20.

随着对云计算需求的发展,越来越多的机构为了节约成本和服务灵活性把数据和查询服务外包到云中。假设在一个机构内有大量用户在查询云中数据,多个代理服务器被部署到机构内部来实现成本效率和负载均衡。假设有n个查询,每个查询由多个关键词组成,给定k个代理服务器,要解决的问题是如何将n个查询分成k组从而最小化每组之间的差距,同时最小化所有组中不同关键词的数目。由于该问题是NP-难题,可以通过数学方法和启发式方法进行求解。数学分组策略使用局部最优方法,而启发式分组策略基本思想是k-means。特别是,两种策略都提供了扩展方案:第一个扩展关注鲁棒性,即在一些代理服务器失效的情况下,每个用户仍然可以得到查询结果;第二个扩展关注效益,即每个用户可以尽可能多的取回自己可能感兴趣的文件。在合成数据集和真实数据集上进行了大量实验,验证了所提分组策略的有效性。

关键词: 成本效率, 负载均衡, 鲁棒性, 效益

Abstract: As the demand for the development of cloud computing grows, more and more organizations have outsourced their data and query services to the cloud for cost-saving and flexibility. Suppose an organization that has a great number of users querying the cloud-deployed multiple proxy servers to achieve cost efficiency and load balancing. Given n queries, each of which is expressed as several keywords, and k proxy servers, the problem to be solved is how to classify n queries into k groups, in order to minimize the difference between each group and the number of distinct keywords in all groups. Since this problem is NP-hard, it is solved in mathematic and heuristic ways. Mathematic grouping uses a local optimization method, and heuristic grouping is based on k-means. Specifically, two extensions are provided:the first one focuses on robustness, i.e., each user obtains search results even if some proxy servers fail; the second one focuses on benefit, i.e., each user can retrieve as many files as possible that may be of interest without increasing the sum. Extensive evaluations have been conducted on both a synthetic dataset and real query traces to verify the effectiveness of our strategies.

Key words: cloud computing, cost efficiency, load balancing, robustness, benefit

[1] Mell P M, Grance T. The NIST definition of cloud computing. Communications of the ACM, 2010, 53(6):Article No. 50.

[2] Fu Z J, Shu J G, Sun X M, Zhang D X. Semantic keyword search based on trie over encrypted cloud data. In Proc. the 2nd Int. Workshop on Security in Cloud Computing, June 2014, pp.59-62.

[3] Fu Z J, Ren K, Shu J G, Sun X M, Huang F X. Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel and Distributed Systems, 2016, 27(9):2546-2559.

[4] Liu Q, Tan C C, Wu J, Wang G J. Cooperative private searching in clouds. Journal of Parallel and Distributed Computing, 2012, 72(8):1019-1031.

[5] Liu Q, Tan C C, Wu J, Wang G J. Towards differential query services in costefficient clouds. IEEE Trans. Parallel and Distributed Systems, 2014, 25(6):1648-1658.

[6] Sweeney L. k-anonymity:A model for protecting privacy. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, 2002, 10(5):557-570.

[7] Niu B, Li Q H, Zhu X Y, Cao G H, Li H. Achieving k-anonymity in privacy-aware location-based services. In Proc. IEEE INFOCOM, April 27-May 2, 2014, pp.754-762.

[8] Yi X, Paulet R, Bertino E, Varadharajan V. Practical approximate k nearest neighbor queries with location and query privacy. IEEE Trans. Knowledge and Data Engineering, 2016, 28(6):1546-1559.

[9] Kanungo T, Mount D M, Netanyahu N S, Piatko C D, Silverman R, Wu A Y. An efficient k-means clustering algorithm:Analysis and implementation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2002, 24(7):881-892.

[10] Guo Y H. Active instance sampling via matrix partition. In Proc. NIPS, December 2010, pp.802-810.

[11] Hamerly G. Making k-means even faster. In Proc. SIAM Int. Conf. Data Mining, April 2010, pp.130-140.

[12] Pass G, Chowdhury A, Torgeson C. A picture of search. In Proc. the 1st Int. Conf. Scalable Information Systems, May 30-June 1, 2006.

[13] Gates A F, Natkovich O, Chopra S, Kamath P, Narayanamurthy S M, Olston C, Reed B, Srinivasan S, Srivastava U. Building a high-level dataflow system on top of MapReduce:The pig experience. In Proc. VLDB Endowment, August 2009, pp.1414-1425.

[14] Nykiel T, Potamias M, Mishra C, Kollios G, Koudas N. MRShare:Sharing across multiple queries in MapReduce. In Proc. VLDB Endowment, September 2010, pp.494-505.

[15] Herodotou H, Lim H, Luo G, Borisov N, Dong L, Cetin F B, Babu S. Starfish:A self-tuning system for big data analytics. In Proc. Biennial Conf. Innovative Data Systems Research, January 2011, pp.261-272.

[16] Lei C, Zhuang Z F, Rundensteiner E A, Eltabakh M. Shared execution of recurring workloads in MapReduce. In Proc. VLDB Endowment, September 2015, pp.714-725.

[17] Aggarwal C C, Zhai C X. A survey of text clustering algorithms. In Mining Text Data, Aggarwal C C, Zhai C X (eds.), Springer, 2012, pp.77-128.

[18] Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya A Y, Foufou S, Bouras A. A survey of clustering algorithms for big data:Taxonomy and empirical analysis. IEEE Trans. Emerging Topics in Computing, 2014, 2(3):267-279.

[19] Vu T T, Willis A, Song D W. Modelling time-aware search tasks for search personalisation. In Proc. the 24th Int. Conf. World Wide Web, May 2015, pp.131-132.

[20] Zhao Y, Karypis G. Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 2004, 55(3):311-331.

[21] Zhang T, Ramakrishnan R, Livny M. BIRCH:An efficient data clustering method for very large databases. ACM SIGMOD Record, 1996, 25(2):103-114.

[22] Guha S, Rastogi R, Shim K. CURE:An efficient clustering algorithm for large databases. Information Systems, 2001, 26(1):35-58.

[23] Karypis G, Han E H, Kumar V. Chameleon:Hierarchical clustering using dynamic modeling. Computer, 1999, 32(8):68-75.

[24] Guha S, Rastogi R, Shim K. ROCK:A robust clustering algorithm for categorical attributes. In Proc. the 15th Int. Conf. Data Engineering, March 1999, pp.512-521.

[25] Schütz H, Silverstein C. Projections for efficient document clustering. ACM SIGIR Forum, 1997, 31(SI):74-81.

[26] Cutting D R, Karger D R, Pedersen J O, Tukey J W. Scatter/Gather:A cluster-based approach to browsing large document collections. In Proc. the 15th Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, June 1992, pp.318-329.

[27] Sarle W S. Finding groups in data:An introduction to cluster analysis. Journal of the American Statistical Association, 1991, 86(415):830-833.

[28] Ng R J, Han J W. Efficient and effective clustering methods for spatial data mining. In Proc. the 20th Int. Conf. Very Large Data Bases, September 1994, pp.144-155.

[29] Ng R T, Han J W. CLARANS:A method for clustering objects for spatial data mining. IEEE Trans. Knowledge and Data Engineering, 2002, 14(5):1003-1016.

[30] Wei C P, Lee Y H, Hsu C M. Empirical comparison of fast clustering algorithms for large data sets. In Proc. the 33rd Annual Hawaii Int. Conf. System Sciences, January 2000.
[1] Tong Chen, Ji-Qiang Liu, He Li, Shuo-Ru Wang, Wen-Jia Niu, En-Dong Tong, Liang Chang, Qi Alfred Chen, Gang Li. 基于动态偏度和稀疏度计算的A3C鲁棒性评估:一种并行计算视角[J]. 计算机科学技术学报, 2021, 36(5): 1002-1021.
[2] Yue-Huan Wang, Ze-Nan Li, Jing-Wei Xu, Ping Yu, Taolue Chen, Xiao-Xing Ma. 基于鲁棒性预测的深度神经网络质量保障[J]. 计算机科学技术学报, 2020, 35(5): 999-1015.
[3] Jun-Hua Fang, Peng-Peng Zhao, An Liu, Zhi-Xu Li, Lei Zhao. 分布式数据流中轨迹大数据的自适应连接方法[J]. 计算机科学技术学报, 2019, 34(4): 747-761.
[4] Zhen-Hua Li, Gang Liu, Zhi-Yuan Ji, Roger Zimmermann. 基于腾讯大数据的高成本效益的云下载系统设计[J]. , 2015, 30(6): 1163-1174.
[5] Wen-Yu Li, Xiang Zhang, Shu-Cong Jia, Xin-Yu Gu, Lin Zhang, Xiao-Yu Duan, and Ji. LTE SON网络下一种动态自适应负载均衡与切换的联合最优化算法[J]. , 2013, 28(3): 437-444.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] 高庆狮; 张祥; 杨树范; 陈树清;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] 朱鸿;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] 李明慧;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: