IJSRST173743 | Received : 10 Sep 2017 | Accepted : 19 Sep 2017 | September-October-2017 [(3) 7: 172-181] © 2017 IJSRST | Volume 3 | Issue 7 | Print ISSN: 2395-6011 | Online ISSN: 2395-602X Themed Section: Science and Technology 172 Web Page Noise Removal - A Survey Dr. S. Vijayarani 1 , K.Geethanjali 2 1 Assistant Professor, Department of Computer Science, Bharathiar University, Coimbatore, Tamilnadu, India 2 M.Phil Research Scholar, Department of Computer Science, Bharathiar University, Coimbatore, Tamilnadu, India ABSTRACT Web mining is used to extract useful information from websites which includes web documents and hyperlinks of web sites. The World Wide Website contains a wide range of web pages which are very useful to many users. Web pages are composed of different kinds of data, such as text, audio, video and images. In addition to this, nowadays, web pages contain a large amount of unnecessary data, e.g., advertisement posters, navigation bars and disclaimer/copyright notices. These types of unnecessary data are called as noisy data. This has created the distractions to the user and also increases the time to perform searches and browsing tasks. To perform in-depth analysis of web data or web content mining, the first and essential step is to remove the noises which are existing in the web pages, and then we can extract useful information from the web pages. Removing noise from the web page is challenging task in web content mining. This main objective of this paper is to discuss the basics of web content mining, types of noises, techniques used for noise removal and different models used in the literature. Keywords : Web Content, Web page, Global Noise, Local Noise, Filtering. I. INTRODUCTION Web mining is used to extract knowledge from web data. Web mining is classified into three main categories, i.e. Web content mining, Web structure mining and Web usage mining data. Web content mining is used to mine data from the content of web pages. Web pages consist of text, graphics, tables, data blocks and data records [1]. Web Content Mining uses the ideas and principles of data mining and knowledge discovery process. Web usage mining is also known as web log mining, which is used to analyze the behavior of website users. It can be used to predict the user behavior while the user interacts with the web. Web structure mining is based on the link structures. It can be used to categorize web pages and is useful to generate information such as similarity and relationship between different websites. Extracting the useful information from web pages becomes essential task. The web page is a medium for accessing the information from different sources. Extracting the information from various resources has many problems like finding the useful information, extracting the knowledge from large data set and learning about individual users. To resolve these problems various methods and techniques are developed. The information technology field has a massive amount of data that needs to transform or extract into useful information. This extracted information can be used for several applications. To extract the useful information there are different kinds of algorithms and techniques are available for different types of data. Web content mining includes various kinds of data such as: image, audio, video and text. In web mining web documents can be divided into three kinds namely core information, redundant information and hidden information [13]. Web documents also comprise “hidden information” like HTML tags, script language and programming comments, which is called „hidden information‟. The repeated data in web documents are called as redundant information. The main content or information of the web page like, news article are known as the core information. In a web mining system, the input data moves through the three different stages to reach its final result: namely preprocessing, data mining and post processing [2]. Pre- processing may include removing attributes that are irrelevant and cleaning the data from noisy information. Data mining is a generic term that includes the techniques and tools used to extract useful information