Int. J. Knowledge and Web Intelligence, Vol. 5, No. 1, 2014 49
Copyright © 2014 Inderscience Enterprises Ltd.
Review of web crawlers
S.R. Sreeja* and Sangita Chaudhari
Department of Computer Science,
A.C. Patil College of Engineering,
Sector 4, Kharghar, Navi Mumbai,
Maharashtra, 410210, India
E-mail: sreejasr09@gmail.com
E-mail: sschaudhari@acpce.ac.in
*Corresponding author
Abstract: The web is a repository of large amount of data. Information
available in the web is organised in the form of pages. Due to the presence of
unlimited amount of information, searching and finding out appropriate
information from the web is a task which needs expertise. Web crawlers are
programmes that assist search engines by automating the task of visiting web
pages and downloading their contents. They also help in ranking the
downloaded web pages. Thus, the search engines can produce a list of web
pages ordered by their relevance and can display this list as a result of the
search. Crawling also helps to validate web pages, analyse them, notify about
page-updation, visualise web pages and sometimes for collecting e-mail
addresses for spam purposes. They can be of different types, each one using
different strategies and techniques to crawl web pages. This paper presents a
review of various types of web crawlers.
Keywords: deep web crawler; focused crawler; web forum; forum crawler;
web intelligence; web crawler review.
Reference to this paper should be made as follows: Sreeja, S.R. and
Chaudhari, S. (2014) ‘Review of web crawlers’, Int. J. Knowledge and Web
Intelligence, Vol. 5, No. 1, pp.49–61.
Biographical notes: S.R. Sreeja received her BTech in Computer Science and
Engineering from Mahatma Gandhi University Kottayam. She is a PG student
in Computer Engineering Department in A.C. Patil College of Engineering,
Mumbai University. Her research interests include the concepts and issues
related to web data mining and web crawlers.
Sangita Chaudhari received her ME in Computer Engineering from Mumbai
University, India. Currently, she is an Assistant Professor at A.C. Patil College
of Engineering, Kharghar, Navi Mumbai, India. Her research interests include
digital image processing, advanced databases, information systems, and
information security techniques. She has published more than 15 papers in
national/international conferences and journals.
1 Introduction
Web crawler is a programme or a suit of programmes that is used to retrieve contents of
web pages. This content retrieval is done mainly for the purpose of ranking the web