We use cookies to improve your experience with our site.

云计算环境下有效地查询分组策略

Effective Query Grouping Strategy in Clouds

  • 摘要: 随着对云计算需求的发展,越来越多的机构为了节约成本和服务灵活性把数据和查询服务外包到云中。假设在一个机构内有大量用户在查询云中数据,多个代理服务器被部署到机构内部来实现成本效率和负载均衡。假设有n个查询,每个查询由多个关键词组成,给定k个代理服务器,要解决的问题是如何将n个查询分成k组从而最小化每组之间的差距,同时最小化所有组中不同关键词的数目。由于该问题是NP-难题,可以通过数学方法和启发式方法进行求解。数学分组策略使用局部最优方法,而启发式分组策略基本思想是k-means。特别是,两种策略都提供了扩展方案:第一个扩展关注鲁棒性,即在一些代理服务器失效的情况下,每个用户仍然可以得到查询结果;第二个扩展关注效益,即每个用户可以尽可能多的取回自己可能感兴趣的文件。在合成数据集和真实数据集上进行了大量实验,验证了所提分组策略的有效性。

     

    Abstract: As the demand for the development of cloud computing grows, more and more organizations have outsourced their data and query services to the cloud for cost-saving and flexibility. Suppose an organization that has a great number of users querying the cloud-deployed multiple proxy servers to achieve cost efficiency and load balancing. Given n queries, each of which is expressed as several keywords, and k proxy servers, the problem to be solved is how to classify n queries into k groups, in order to minimize the difference between each group and the number of distinct keywords in all groups. Since this problem is NP-hard, it is solved in mathematic and heuristic ways. Mathematic grouping uses a local optimization method, and heuristic grouping is based on k-means. Specifically, two extensions are provided:the first one focuses on robustness, i.e., each user obtains search results even if some proxy servers fail; the second one focuses on benefit, i.e., each user can retrieve as many files as possible that may be of interest without increasing the sum. Extensive evaluations have been conducted on both a synthetic dataset and real query traces to verify the effectiveness of our strategies.

     

/

返回文章
返回