›› 2012, Vol. 27 ›› Issue (6): 1233-1242.doi: 10.1007/s11390-012-1299-8

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

A Multi-Threaded Semantic Focused Crawler

Punam Bedi1, Member, ACM, Senior Member, IEEE, Anjali Thukral2,*, Hema Banati3, Abhishek Behl1,**, and Varun Mendiratta1,**   

  1. 1. Department of Computer Science, University of Delhi, Delhi-110007, India;
    2. Department of Computer Science, Keshav Mahavidyalaya, University of Delhi, Delhi-110007, India;
    3. Department of Computer Science, Dyal Singh College, University of Delhi, Delhi-110007, India
  • Received:2011-07-20 Revised:2012-08-11 Online:2012-11-05 Published:2012-11-05

CLC Number: 

  • null
[1] Spivack N.Web evolution. http://www.slideshare.net/novaspi-vack/web-evolution-nova-spivack-twine, June 2011.

[2] Kleinberg J, Lawrence S. The structure of the Web. Science,2001, 294(5548): 1849-1850.

[3] Salton G, Buckley C. Term-weighting approaches in auto-matic text retrieval. Information Processing & Management,1988, 24(5): 513-523.

[4] Navigli R, Velardi P. An analysis of ontology-based query ex-pansion strategies. In Proc. Workshop on Adaptive Text Ex-traction and Mining, Sept. 2003, pp.42-49.

[5] Bedi P, Banati H, Thukral A. Social semantic retrieval andranking of eResources. In Proc. the 2nd Int. Conferenceon Advances in Recent Technologies in Communication andComputing, Oct. 2010, pp.343-347.

[6] Berners-Lee T. Giant global graph. http://dig.csail.mit.edu/breadcrumbs/node/215, May 2011.

[7] Farber D. From semantic Web (3.0) to the WebOS(4.0). http://www.zdnet.com/blog/btl/from-semantic-web-30-to-the-webos-40/4499, May 2011.

[8] Berners-Lee T, Hendler J, Lassila O. The semantic web. Sci-entific American, 2001 284(3): 34-43.

[9] Bedi P, Banati H, Thukral A. Use of ontology for reusingweb repositories for eLearning. In Technological Develop-ments in Networking, Education and Automation, ElleithyK et al. (eds.), New York, USA: Springer, 2010, pp.97-101.

[10] Hendler J, Berners-Lee T. From the semantic web to socialmachines: A research challenge for AI on the World WideWeb. Artificial Intelligence, 2010, 174(2): 156-161.

[11] Berners-Lee T. Semantic Web and linked data. http://www.w3.org/2009/Talks/0120-campus-party-tbl/, June 2011.

[12] Pant G, Srinivasan P, Menczer F. Crawling the web. In WebDynamics: Adapting to Change in Content, Size, Topologyand Use, Levene M, Poulovassilis A (eds.), Springer-Verlag,2004, pp.153-178.

[13] Castillo C. Effective Web crawling [Ph.D. Thesis]. Dept. ofComputer Science, University of Chile, November 2004.

[14] Bidoki A M Z, Salehie M, Azadnia M. Analysis of priority andpartitioning effects on web crawling performance. In Proc.the Intelligent Information Processing and Web Mining Con-ference, May 2004, pp.287-296.

[15] Chakrabarti S, van den Berg M, Dom B. Focused crawling: Anew approach to topic-specific web resource discovery. Com-puter Networks, 1999, 31(11-16): 1623-1640.

[16] Dong H, Hussain F K. Focused crawling for automatic ser-vice discovery, annotation and classification in industrial digi-tal ecosystems. IEEE Transactions on Industrial Electronics,2011, 58(6): 2106-2116.

[17] Craswell N, Hawking D, Robertson S. Effective site findingusing link anchor information. In Proc. the 24th Annual Int.ACM SIGIR Conference on Research and Development inInformation Retrieval, Sept. 2001, pp.250-257.

[18] Jamali M, Sayyadi H, Hariri B B, Abolhassani H. A methodfor focused crawling using combination of link structure andcontent similarity. In Proc. IEEE/WIC/ACM Int. Confer-ence on Web Intelligence, Dec. 2006, pp.753-756.

[19] Hati D, Kumar A. An approach for identifying URLs based ondivision score and link score in focused crawler. Int. Journalof Computer Application, 2010, 2(3): 48-53.

[20] Page L, Brin S, Motwani R, Winograd T. The PageRank ci-tation ranking: Bringing order to the Web. In Proc. the 7thInt. WWW Conference, April 1998, pp.161-172.

[21] Callen B. Search Engine Optimization Made Easy. http://www.easywebtutorials.com / ebooks /SEO- MadeEasy . pdf,June 2011.

[22] The Bivings group. SEO basics. http://www.knightdigitalme-diacenter.org/images/uploads/leadership/SEO%20Basics.pdf,June 2011.

[23] Google. Search engine optimization starter guide. http://www.google.com/webmasters/docs/search-engine-optimiza-tion-starter-guide.pdf, June 2011.

[24] Batsakis S, Petrakis E G M, Milios E. Improving the perfor-mance of focused web crawlers. Data & Knowledge Engineer-ing, 2009, 68(10): 1001-1013.

[25] Thukral A, Mendiratta V, Behl A, Banati H, Bedi P. FCHC:A social semantic focused crawler. In Proc. Int. Conf. Advances in Computing and Communications, July 2011,pp.273-283.

[26] Thukral A, Banati H, Bedi P. Ranking tagged resources usingsocial semantic relevance. Information Retrieval Research,2011, 1(3): 15-34.

[27] Ding L, Finin T, Joshi A et al. Swoogle: A search and meta-data engine for the semantic web. In Proc. the 13th ACMConf. Information and Knowledge Management, Nov. 2004,pp.652-659.

[28] Patel C, Supekar K, Lee Y, Park E K. OntoKhoj: A semanticweb portal for ontology searching, ranking and classification.In Proc. the 5th ACM Int. Workshop on Web Informationand Data Management, Nov. 2003, pp.58-61.

[29] Lozano-Tello A, Gómez-P閞ez A. ONTOMETRIC: A methodto choose the appropriate ontology. Journal of DatabaseManagement, 2004, 15(2): 1-18.

[30] Alani H, Brewster C, Shadbolt N. Ranking ontologies withAKTiveRank. In Proc. the 5th Int. Conf. Semantic Web,Nov. 2006, pp.1-15.

[31] Dong H, Hussain F K, Chang E. A survey in semantic webtechnologies-inspired focused crawlers. In Proc. the 3rd Int.Conf. Digital Information Management, Nov. 2008. pp.934-936.

[32] Dong H, Hussain F K, Chang E. State of the art in semanticfocused crawlers. In Proc. Int. Conference on ComputationalScience and its Applications, June 29-July 1, 2009, Part 2,pp.910-924.

[33] Ehrig M, Maedche A. Ontology-focused crawling of Web doc-uments. In ACM Symposium on Applied Computing, March2003, pp.1174-1178.

[34] Garcia E. The classical vector space model: Descrip-tion, advantages and limitations of the classic vector spacemodel. http://www.miislita.com/term-vector/term-vector-3.html, Oct. 2010.

[35] Diligenti M, Coetzee F, Lawrence S, Giles C, Gori M. Focusedcrawling using context graphs. In Proc. the 26th Int. Con-ference on Very Large Data Bases, Sept. 2000, pp.527-534.

[36] Halkidi M, Nguyen B, Varlamis I, Vazirgiannis M. THESUS:Organizing Web document collection based on link semantics.Journal on Very Large Data Bases, 2003, 12(4): 1-13.

[37] Ganesh S, Jayaraj M, Kalyan V et al. Ontology-based webcrawler. In Proc. Int. Conf. Information Technology: Cod-ing and Computing, April 2004, 2: 337-341.

[38] Tane J, Schmitz C, Stumme G. Semantic resource manage-ment for the web: An e-learning application. In Proc. the13th Int. World Wide Web Conference on Alternate TrackPapers & Posters, May 2004, pp.1-10.

[39] Maedche A, Staab S. Ontology learning. In Handbook on On-tologies, Staab S, Studer R (eds.), Springer-Germany, 2004.

[40] Yuvarani M, Iyengar N Ch S N, Kannan A. LSCrawler: Aframework for an enhanced focused web crawler based on linksemantics. In Proc. Int. Conference on Web Intelligence,Dec. 2006, pp.794-800.

[41] Thukral A, Bedi P, Banati H. Architecture to organize so-cial semantic relevant web resources in a knowledgebase. Int.Journal of e-Education, e-Business, e-Management and e-Learning, 2011, 1(1): 45-51.

[42] Thukral A, Bedi P, Banati H. Automatic organization of webresources in ontologies for learning purpose. In Proc. the 2ndInt. Conference on e-Education, e-Business, e-Managementand E-Learning, Jan. 2011, pp.38-44.

[43] Cimiano P. Ontology Learning and Population from Text: Al-gorithms, Evaluation and Applications. Springer Heidelberg,2006.

[44] Novak J D. Learning, creating, and using knowledge: Conceptmaps as facilitative tools in schools and corporations. Journalof e-Learning and Knowledge Society, 2010, 6(3): 21-30.

[45] Isaac A, Summers E. SKOS: Simple knowledge organi-zation system primer. http://www.w3.org/TR/skos-primer,Feb. 2011.

[46] Hliaoutakis A, Varelas G, Voutsakis E et al. Information re-trieval by semantic similarity. Int. Journal on Semantic Weband Information Systems, 2006, 3(3): 55-73.

[47] Dong H, Hussain F K, Chang E. A context-aware semanticsimilarity model for ontology environments. Concurrency andComputation: Practice and Experience, 2010, 23(5): 505-524.

[48] Menczer F, Pant G, Ruiz M E, Srinivasan P. Evaluating topic-driven web crawlers. In Proc. the 24th Annual Int. ACMSIGIR Conference on Research and Development in Infor-mation Retrieval, Sept. 2001, pp.241-249.

[49] Zheng H T, Kang B Y, Kim H G. Learnable focused crawl-ing based on ontology. In Proc. the 4th AIRS, Jan. 2008,pp.264-275.
[1] Hui-Feng Sun, Jun-Liang Chen, Gang Yu, Chuan-Chang Liu, Yong Peng, Guang Chen, and Bo Cheng. JacUOD: A New Similarity Measurement for Collaborative Filtering [J]. , 2012, 27(6): 1252-1260.
[2] Ying-Jun Wu(吴英骏), Han Huang(黄翰), Member, CCF, ACM, IEEE, Zhi-Feng Hao(郝志峰), and Feng Chen(陈丰). Local Community Detection Using Link Similarity [J]. , 2012, 27(6): 1261-1268.
Full text



[1] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .
[2] Zhou Qihai;. An Improved Graphic Representation for Structured Program Design[J]. , 1991, 6(2): 205 -208 .
[3] Zeng Jianchao; Hidehilio Sanada; Yoshikazu Tezuka;. A Form Evaluation System and Its Data Structure for Brush-Written Chinese Characters[J]. , 1995, 10(1): 35 -41 .
[4] Hao Ruibing; Wu Jianping;. A Formal Approach to Protocol Interoperability Testing[J]. , 1998, 13(1): 79 -90 .
[5] WEI Hua; LUO Yupin; YANG Shiyuan;. Fault Tolerance of Reconfigurable Bi-Directional Double-Loop LANs[J]. , 1999, 14(4): 379 -385 .
[6] ZHANG Shugong; LIU Ying; FENG Guochen;. The Multiplicity of Zeros of Algebraic System in Eigenvalue Method[J]. , 1999, 14(5): 510 -517 .
[7] Sungchan Kim, Kunwoo Lee, Taesik Hong, and Moonki Jung. Modeling in Multi-Resolution and Its Applications[J]. , 2006, 21(2): 272 -278 .
[8] Zhi-Hong Tao, Hans Kleine Büning, and Li-Fu Wang. Direct Model Checking Matrix Algorithm[J]. , 2006, 21(6): 944 -949 .
[9] Jiang Yu, Andrew Tappenden, James Miller, and Michael Smith. A Scalable Testing Framework for Location-Based Services[J]. , 2009, 24(2): 386 -404 .
[10] Moonki Jung, Hyundeok Cho, Taehwan Roh, and Kunwoo Lee. Integrated Framework for Vehicle Interior Design Using Digital Human Model[J]. , 2009, 24(6): 1149 -1161 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved