›› 2010, Vol. 25 ›› Issue (3): 482-498.

• Special Section on Trends Changing Data Management • Previous Articles     Next Articles

HAPS: Supporting Effective and Efficient Full-Text P2P Search with Peer Dynamics

Zu-Jie Ren (任祖杰), Ke Chen* (陈珂), Li-Dan Shou (寿黎但), Gang Chen (陈刚), Yi-Jun Bei (贝毅君), and Xiao-Yan Li (李晓燕)   

  1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
  • Received:2009-06-30 Revised:2010-02-04 Online:2010-05-05 Published:2010-05-05
  • About author:
    Zu-Jie Ren received the B.S. degree in mathematics from Nanjing University of Information Science and Technology in 2005. From Sept. 2005 to Jan. 2007, he studied at College of Computer Science and Technology, Zhejiang University (ZJU). He is a Ph.D. candidate of the College of Computer Science and Technology, ZJU. His research interests include distributed information retrieval, multimedia database and data mining.
    Ke Chen is a research associate of computer science, ZheJiang University (ZJU), Hangzhou, China. She received the Ph.D. degree in computer science from ZJU in 2007. Her research interests include spatial temporal data management, peer-to-peer systems, and system security. %Contact her via Email : chenk@zju.edu.cn.
    Li-Dan Shou received his B.Sc. and M.Sc. degrees in computer science from Zhejiang University in 1996 and 1999 respectively, and Ph.D. degree in computer science from the National University of Singapore in 2003. From 2003 to 2006, he worked in the software industry. Since 2006, he is a faculty member of the College of Computer Science and Technology, Zhejiang University, China. He is currently working with the Database Lab of Zhejiang University. His research interests include spatial database, data access methods, visual and multimedia databases, and data mining.
    Gang Chen received his Ph.D. degree in computer science from Zhejiang University in 1998. Since 1998, he is a faculty member of the College of Computer Science and Technology, Zhejiang University, China. From 1999 until 2003, he is an associate professor of computer science, Zhejiang University. He has been promoted as professor of computer science in 2003. His research interests involve database management, information security, cooperative design, CIMS.
    Yi-Jun Bei received the B.S. and Ph.D. degrees in computer science and technology from Zhejiang University, in 2003 and 2008. He worked as a post doctoral researcher in the College of Computer Science and Technology, Zhejiang University. His research interests include databases data mining, XML data mining and social network analysis.
    Xiao-Yan Li received the B.S. and Ph.D. degrees in computer science from Zhejiang University in 2004 and 2009 respectively. From Jun. 2006 to Dec. 2006, she was a visiting student at Database Lab in SOC, National University of Singapore. Since Jul. 2009, she is a postdoc researcher of the College of Computer Science and Technology, Zhejiang University, China. Her research interests include multimedia database, semantic learning and modeling, Web image retrieval.
  • Supported by:

    This work is supported in part by the National Natural Science Foundation of China under Grant Nos. 60803003, 60970124, 60903038, and by the Science and Technology Projects of Zhejiang Province under Grant No. 2008C14G2010007.

Recently, peer-to-peer (P2P) search technique has become popular in the Web as an alternative to centralized search due to its high scalability and low deployment-cost. However, P2P search systems are known to suffer from the problem of peer dynamics, such as frequent node join/leave and document changes, which cause serious performance degradation. This paper presents the architecture of a P2P search system that supports full-text search in an overlay network with peer dynamics. This architecture, namely HAPS, consists of two layers of peers. The upper layer is a DHT (distributed hash table) network interconnected by some super peers (which we refer to as hubs). Each hub maintains distributed data structures called search directories, which could be used to guide the query and to control the search cost. The bottom layer consists of clusters of ordinary peers (called providers), which can receive queries and return relevant results. Extensive experimental results indicate that HAPS can perform searches effectively and efficiently. In addition, the performance comparison illustrates that HAPS outperforms a flat structured system and a hierarchical unstructured system in the environment with peer dynamics.


[1] Nottelmann H, Fischer G, Titarenko A, Nurzenski A. An integrated approach for searching and browsing in heterogeneous peer-to-peer networks. In ACM SIGIR Workshop Hetergeneous and Distributed Information Retrieval, Salvador, Brazil, Aug. 19, 2005.

[2] Bender M, Michel S, Triantafillou P, Weikum G, Zimmer C. Improving collection selection with overlap awareness in P2P search engines. In Proc. ACM SIGIR Conf. Research and Development in Information Retrieval, Salvador, Brazil, Aug. 15-19, 2005, pp.67-74.

[3] Lu J, Callan J. Merging retrieval results in hierarchical peerto-peer networks. In Proc. ACM SIGIR Conf. Research and Development in Information Retrieval, Sheffield, UK, July 25-29, 2004, pp.472-473.

[4] Nejdl W, Wolpers M, Siberski W, Schmitz C. Super-peer based routing and clustering strategies for RDF-based peerto-peer networks. In Proc. WWW, Budapest, Hungary, May 20-24, 2003, pp.536-543.

[5] Xing Jin, W-P Ken Yiu, S-H Gray Chan. Supporting multiple-keyword search in a hybrid structured peer-to-peer network. In Proc. IEEE International Conference on Communications, Seoul, Korea, May 16-20, 2005, pp.42-47.

[6] Garces-Erice L, Biersack E W, Felber P, Ross K W, UrvoyKeller G. Hierarchical peer-to-peer systems. In Proc. EuroPar, Klagenfurt, Austria, Aug. 26-29, 2003, pp.1230-1239.

[7] Ganesan P, Gummadi K, Garcia-Molina H. Canon in G major: Designing DHTS with hierarchical structure. In Proc. ICDCS, Tokyo, Japan, March 23-26, 2004, pp.263-272.

[8] Luu T, Klemm F, Podnar I, Aberer K, Rajman M. Alvis peers: A scalable full-text peer-to-peer retrieval engine. In Proc. P2PIR, Arlington, USA, Nov. 11, 2006, pp.41-48.

[9] Tang C, Dwarkadas S. Hybrid global-local indexing for efficient peer-to-peer information retrieval. In Proc. NSDI, San Francisco, USA, Mar. 29-31, 2004, pp.211-224.

[10] Podnar I, Rajman M, Luu T, Klemm F, Aberer K. Scalable peer-to-peer web retrieval with highly discriminative keys. In Proc. IEEE ICDE, Istanbul, Turkey, Apr. 15-20, 2007, pp.1096-1105.

[11] Reynolds P, Vahdat A. Efficient peer-to-peer keyword searching. In Proc. Int. Conf. Middleware, 2003, Rio de Janeiro, Brazil, June 16-20, pp.21-40.

[12] Bender M, Michel S, Triantafillou P, Weikum G, Zimmer C. Minerva: Collaborative P2P search. In Proc. VLDB, Trondheim, Norway, Aug. 30-Sept. 2, 2005, pp.1263-1266.

[13] Crespo A, Garcia-Molina H. Semantic overlay networks for P2P systems. In Proc. AP2PC, New York, USA, July 19, 2004, pp.1-13.

[14] Lu J, Callan J. Federated search of text-based digital libraries in hierarchical peer-to-peer networks. In Proc. ECIR, Santiago de Compostela, Spain, Mar. 21-23, 2005, pp.52-66.

[15] Godfrey P B, Shenker S, Stoica I. Minimizing churn in distributed systems. In Proc. ACM SIGCOMM, Pisa, Italy, Sept. 11-15, 2006, pp.147-158.

[16] Rhea S, Geels D, Roscoe T, Kubiatowicz J. Handling churn in a DHT. In Proc. USENIX Annual Technical Conference, Boston, USA, June 27-July 2, 2004, pp.127-140.

[17] Linga P, Gupta I, Birman K. A churn-resistant peer-to-peer web caching system. In Proc. Workshop on Survivable and Self-Regenerative System, Fairfax, USA, Oct. 31, 2003, pp.110.

[18] Stutzbach D, Rejaie R. Understanding churn in peer-to-peer networks. In Proc. Internet Measurement Conference, Rio de Janeiro, Brazil, Oct. 25-27, 2006, pp.189-202.

[19] Klemm F, Aberer K. Aggregation of a term vocabulary for peer-to-peer information retrieval: A DHT stress test. In Proc. DBISP2P, Trondheim, Norway, Aug. 28-29, 2005, pp.187-194.

[20] Stocia I, Morris R, Karger D, Kaashoek F, Balakrishnan H. Chord: A scalable peer-to-peer lookup service for Internet applications. In Proc. ACM SIGCOMM, San Diego, USA, Aug. 27-31, 2001, pp.149-160.

[21] Michel S, Triantafillou P, Weikum G. Klee: A framework for distributed top-k query algorithms. In Proc. VLDB, Trondheim, Norway, Aug. 30-Sept. 2, 2005, pp.637-648.

[22] Vlachou A, Doulkeridis C, Norvag K, Vazirgiannis M. On ef-ficient top-k query processing in highly distributed environments. In Proc. SIGMOD, Vancouver, Canada, June 9-12, 2008, pp.753-764.

[23] Zhou S, Zhang Z, Qian W, Zhou A. SIPPER: Selecting informative peers in structured P2P environment for contentbased retrieval. In Proc. ICDE, Atlanta, USA, Apr. 3-7, 2006, pp.161-162.

[24] Xu J, Croft W B. Cluster-based language models for distributed retrieval. In Proc. ACM SIGIR Conf. Research and Development in Information Retrieval, Berkeley, USA, Aug. 15-19, 1999, pp.254-261.

[25] Ratnasamy S, Francis P, Handley M, Karp R M, Shenker S. A scalable content-addressable network. In Proc. ACM SIGCOMM, San Diego, USA, Aug. 27-31, 2001, pp.161-172.

[26] El-Ansary S, Alima L O, Brand P, Haridi S. Efficient broadcast in structured P2P networks. In Proc. IPTPS, Berkeley, USA, Feb. 21-22, 2003, pp.304-314.

[27] Zhai C, Lafferty J D. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proc. ACM SIGIR Conf. Research and Development in Information Retrieval, New Orleans, USA, Sept. 9-13, 2001, pp.334342.

[28] Callan J. Distributed Information Retrieval. Advances in Information Retrieval, Kluwer Academic Publishers, 2000, pp.127-150.

[29] Gravano L, Garcia-Molina H, Tomasic A. Gloss: Text-source discovery over the Internet. ACM Transactions of Database System, 1999, 24(2): 229-264.

[30] Callan J, Lu Z, Croft W B. Searching distributed collections with inference networks. In Proc. ACM SIGIR Conf. Research and Development in Information Retrieval,Seattle, USA, July 9-13, 1995, pp.21-28.

[31] Callan J et al. Peer to peer testbed definitions. http://boston.lti.cs.cmu.edu/callan/data/#p2p/trecwt10gquery-bydoc.v1.txt.gz.

[32] Lu J. Full-text federated search in peer-to-peer networks

[Ph.D. Dissertation]. Carnegie Mellon University, 2007.

[33] Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval. Addison-Wesley, 1999.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved