Effective Query Grouping Strategy in Clouds

Qin Liu; Yuhong Guo; Jie Wu; Guojun Wang

doi:10.1007/s11390-017-1797-9

Volume 32 Issue 6

November 2017

Turn off MathJax

Article Contents

Abstract

References

Journal of Computer Science and Technology > 2017 > 32(6): 1231-1249. > DOI: 10.1007/s11390-017-1797-9 CSTR: 32374.14.s11390-017-1797-9

Qin Liu, Yuhong Guo, Jie Wu, Guojun Wang. Effective Query Grouping Strategy in Clouds[J]. Journal of Computer Science and Technology, 2017, 32(6): 1231-1249. DOI: 10.1007/s11390-017-1797-9

Citation:

Qin Liu, Yuhong Guo, Jie Wu, Guojun Wang. Effective Query Grouping Strategy in Clouds[J]. Journal of Computer Science and Technology, 2017, 32(6): 1231-1249. DOI: 10.1007/s11390-017-1797-9

Citation:

Qin Liu, Yuhong Guo, Jie Wu, Guojun Wang. Effective Query Grouping Strategy in Clouds[J]. Journal of Computer Science and Technology, 2017, 32(6): 1231-1249. DOI: 10.1007/s11390-017-1797-9

Previous Article Next Article

PDF

Effective Query Grouping Strategy in Clouds

Qin Liu^1,2 Member, CCF,
Yuhong Guo³ ,
Jie Wu⁴ Fellow, IEEE,
Guojun Wang^5, , Distinguished Member, CCF, Member, ACM, IEEE

1 College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China;
2 State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications Beijing 100876, China;
3 School of Computer Science, Carleton University, Ottawa, ON K155 B6, Canada;
4 Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, U.S.A.;
5 School of Computer Science and Educational Software, Guangzhou University, Guangzhou 510006, China

Funds: This research was supported in part by the National Science Foundation of USA under Grant Nos. CNS-1449860, CNS-1461932, CNS-460971, CNS-1439672, CNS-1301774, and ECCS-1231461, the National Natural Science Foundation of China under Grant Nos. 61632009, 61472451, 61402161, 61472131, 61272151, and 61272546, the Hunan Provincial Natural Science Foundation of China under Grant No. 2015JJ3046, and the Open Foundation of State Key Laboratory of Networking and Switching Technology (Beijing University of Posts and Telecommunications) under Grant No. SKLNST-2016-2-20.

More Information

Author Bio:
Qin Liu received her B.S.degree in 2004 from Hunan Normal University,Changsha,and her M.S.degree in 2007 and Ph.D.degree in 2012 both from Central South University,Changsha,all in computer science.She was a visiting student at Temple University,Philadelphia.Her research interests include security and privacy issues in cloud computing.
Corresponding author:
Guojun Wang E-mail: csgjwang@gmail.com
Received Date: May 17, 2016
Revised Date: January 11, 2017
Published Date: November 04, 2017

Abstract

Abstract

As the demand for the development of cloud computing grows, more and more organizations have outsourced their data and query services to the cloud for cost-saving and flexibility. Suppose an organization that has a great number of users querying the cloud-deployed multiple proxy servers to achieve cost efficiency and load balancing. Given n queries, each of which is expressed as several keywords, and k proxy servers, the problem to be solved is how to classify n queries into k groups, in order to minimize the difference between each group and the number of distinct keywords in all groups. Since this problem is NP-hard, it is solved in mathematic and heuristic ways. Mathematic grouping uses a local optimization method, and heuristic grouping is based on k-means. Specifically, two extensions are provided:the first one focuses on robustness, i.e., each user obtains search results even if some proxy servers fail; the second one focuses on benefit, i.e., each user can retrieve as many files as possible that may be of interest without increasing the sum. Extensive evaluations have been conducted on both a synthetic dataset and real query traces to verify the effectiveness of our strategies.
- cloud computing,
- cost efficiency,
- load balancing,
- robustness

FullText(HTML)

References (30)

References

[1]	Mell P M, Grance T. The NIST definition of cloud computing. Communications of the ACM, 2010, 53(6):Article No. 50.
[2]	Fu Z J, Shu J G, Sun X M, Zhang D X. Semantic keyword search based on trie over encrypted cloud data. In Proc. the 2nd Int. Workshop on Security in Cloud Computing, June 2014, pp.59-62.
[3]	Fu Z J, Ren K, Shu J G, Sun X M, Huang F X. Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel and Distributed Systems, 2016, 27(9):2546-2559.
[4]	Liu Q, Tan C C, Wu J, Wang G J. Cooperative private searching in clouds. Journal of Parallel and Distributed Computing, 2012, 72(8):1019-1031.
[5]	Liu Q, Tan C C, Wu J, Wang G J. Towards differential query services in costefficient clouds. IEEE Trans. Parallel and Distributed Systems, 2014, 25(6):1648-1658.
[6]	Sweeney L. k-anonymity:A model for protecting privacy. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, 2002, 10(5):557-570.
[7]	Niu B, Li Q H, Zhu X Y, Cao G H, Li H. Achieving k-anonymity in privacy-aware location-based services. In Proc. IEEE INFOCOM, April 27-May 2, 2014, pp.754-762.
[8]	Yi X, Paulet R, Bertino E, Varadharajan V. Practical approximate k nearest neighbor queries with location and query privacy. IEEE Trans. Knowledge and Data Engineering, 2016, 28(6):1546-1559.
[9]	Kanungo T, Mount D M, Netanyahu N S, Piatko C D, Silverman R, Wu A Y. An efficient k-means clustering algorithm:Analysis and implementation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2002, 24(7):881-892.
[10]	Guo Y H. Active instance sampling via matrix partition. In Proc. NIPS, December 2010, pp.802-810.
[11]	Hamerly G. Making k-means even faster. In Proc. SIAM Int. Conf. Data Mining, April 2010, pp.130-140.
[12]	Pass G, Chowdhury A, Torgeson C. A picture of search. In Proc. the 1st Int. Conf. Scalable Information Systems, May 30-June 1, 2006.
[13]	Gates A F, Natkovich O, Chopra S, Kamath P, Narayanamurthy S M, Olston C, Reed B, Srinivasan S, Srivastava U. Building a high-level dataflow system on top of MapReduce:The pig experience. In Proc. VLDB Endowment, August 2009, pp.1414-1425.
[14]	Nykiel T, Potamias M, Mishra C, Kollios G, Koudas N. MRShare:Sharing across multiple queries in MapReduce. In Proc. VLDB Endowment, September 2010, pp.494-505.
[15]	Herodotou H, Lim H, Luo G, Borisov N, Dong L, Cetin F B, Babu S. Starfish:A self-tuning system for big data analytics. In Proc. Biennial Conf. Innovative Data Systems Research, January 2011, pp.261-272.
[16]	Lei C, Zhuang Z F, Rundensteiner E A, Eltabakh M. Shared execution of recurring workloads in MapReduce. In Proc. VLDB Endowment, September 2015, pp.714-725.
[17]	Aggarwal C C, Zhai C X. A survey of text clustering algorithms. In Mining Text Data, Aggarwal C C, Zhai C X (eds.), Springer, 2012, pp.77-128.
[18]	Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya A Y, Foufou S, Bouras A. A survey of clustering algorithms for big data:Taxonomy and empirical analysis. IEEE Trans. Emerging Topics in Computing, 2014, 2(3):267-279.
[19]	Vu T T, Willis A, Song D W. Modelling time-aware search tasks for search personalisation. In Proc. the 24th Int. Conf. World Wide Web, May 2015, pp.131-132.
[20]	Zhao Y, Karypis G. Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 2004, 55(3):311-331.
[21]	Zhang T, Ramakrishnan R, Livny M. BIRCH:An efficient data clustering method for very large databases. ACM SIGMOD Record, 1996, 25(2):103-114.
[22]	Guha S, Rastogi R, Shim K. CURE:An efficient clustering algorithm for large databases. Information Systems, 2001, 26(1):35-58.
[23]	Karypis G, Han E H, Kumar V. Chameleon:Hierarchical clustering using dynamic modeling. Computer, 1999, 32(8):68-75.
[24]	Guha S, Rastogi R, Shim K. ROCK:A robust clustering algorithm for categorical attributes. In Proc. the 15th Int. Conf. Data Engineering, March 1999, pp.512-521.
[25]	Schütz H, Silverstein C. Projections for efficient document clustering. ACM SIGIR Forum, 1997, 31(SI):74-81.
[26]	Cutting D R, Karger D R, Pedersen J O, Tukey J W. Scatter/Gather:A cluster-based approach to browsing large document collections. In Proc. the 15th Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, June 1992, pp.318-329.
[27]	Sarle W S. Finding groups in data:An introduction to cluster analysis. Journal of the American Statistical Association, 1991, 86(415):830-833.
[28]	Ng R J, Han J W. Efficient and effective clustering methods for spatial data mining. In Proc. the 20th Int. Conf. Very Large Data Bases, September 1994, pp.144-155.
[29]	Ng R T, Han J W. CLARANS:A method for clustering objects for spatial data mining. IEEE Trans. Knowledge and Data Engineering, 2002, 14(5):1003-1016.
[30]	Wei C P, Lee Y H, Hsu C M. Empirical comparison of fast clustering algorithms for large data sets. In Proc. the 33rd Annual Hawaii Int. Conf. System Sciences, January 2000.

Relative Articles

[1]	Wen-Hong Tian, Min-Xian Xu, Guang-Yao Zhou, Kui Wu, Cheng-Zhong Xu, Rajkumar Buyya. Prepartition: Load Balancing Approach for Virtual Machine Reservations in a Cloud Data Center[J]. Journal of Computer Science and Technology, 2023, 38(4): 773-792. DOI: 10.1007/s11390-022-1214-x
[2]	Yue-Huan Wang, Ze-Nan Li, Jing-Wei Xu, Ping Yu, Taolue Chen, Xiao-Xing Ma. Predicted Robustness as QoS for Deep Neural Network Models[J]. Journal of Computer Science and Technology, 2020, 35(5): 999-1015. DOI: 10.1007/s11390-020-0482-6
[3]	Chong Wang, Kai-Qi Huang. VFM: Visual Feedback Model for Robust Object Recognition[J]. Journal of Computer Science and Technology, 2015, 30(2): 325-339. DOI: 10.1007/s11390-015-1526-1
[4]	Wen-Yu Li, Xiang Zhang, Shu-Cong Jia, Xin-Yu Gu, Lin Zhang, Xiao-Yu Duan, Jia-Ru Lin. A Novel Dynamic Adjusting Algorithm for Load Balancing and Handover Co-Optimization in LTE SON[J]. Journal of Computer Science and Technology, 2013, 28(3): 437-444. DOI: 10.1007/s11390-013-1345-1
[5]	Jin Huang, Feng Zhao, Jian Chen, Jian Pei, Jian Yin. Towards Progressive and Load Balancing Distributed Computation: A Case Study on Skyline Analysis[J]. Journal of Computer Science and Technology, 2010, 25(3): 431-443.
[6]	Li-Na Ni, Jin-Quan Zhang, Chun-Gang Yan, Chang-Jun Jiang. A Heuristic Algorithm for Task Scheduling Based on Mean Load on Grid[J]. Journal of Computer Science and Technology, 2006, 21(4): 559-564.
[7]	Jun-Feng Tian, Jun-Wei Zhang, Feng-Xian Wang. Fault Tolerant Algorithm Based on Dynamic and Active Load Balancing for Redundant Services[J]. Journal of Computer Science and Technology, 2004, 19(6).
[8]	Lin Chengiiang, Li Sanli. Strategy and Simulation of Adaptive RID for Distributed Dynamic Load Balancing in Parallel Systems[J]. Journal of Computer Science and Technology, 1997, 12(2): 113-120.
[9]	LAN Youran. A Dynamic Load Balancing Mechanism for Distributed Systems[J]. Journal of Computer Science and Technology, 1996, 11(3): 195-207.
[10]	Ju Jiubin, Xu Gaochao, Yang Kun. On-Line Predicting Behaviors of Jobs in Dynamic Load Balancing[J]. Journal of Computer Science and Technology, 1996, 11(1): 39-49.

Supplements (0)

Cited By

Get Citation

PDF

XML

Article views (28) PDF downloads (630)

Indexed in:

Effective Query Grouping Strategy in Clouds

Abstract

References

Related Articles

Catalog

Related

Home

Overview

Resources

Contents

Indexed in:

Effective Query Grouping Strategy in Clouds

Abstract

References

Related Articles

Catalog

Related

Home

Overview

Resources

Contents

Export File

Citation

Format

Content