1 Classifying web pages and documents based on expected cross entropy and weighted vote schema Maide Abedini Bagha 1 , Farnaz Laylavi 2 , Rahman Faraji Bashir 3 1. Young Researchers and Elite club, Tabriz Branch, Islamic Azad University, Tabriz, Iran. maide.abedini@gmail.com 2. Department of computer Novin Institute of Higher Education, Ardabil, Iran. Farnaz.laylavi@gmail.com 3. Islamic Azad University, sanandaj, Iran. Rahman.faraji@gmail.com Abstract Traditional information retrieved method use keywords occurring in determine the class of the documents and web pages, but usually retrieves unrelated web page and documents. We propose a web pages and documents scanning and classification method base on support vector machine and expected cross entropy and using a weighted vote schema. Experimental results indicate our method is more effective than traditional methods. Classification accuracy in proposed method is better than other methods and even with a small labeled training set, our method could achieve higher accuracy. Key words: classification. Support vector machine, weighted vote, Expected cross entropy. 1. Introduction The web pages and documents in the internet are growing rapidly. Users can find the web pages and documents by the internet and many search engines are available to users. Organizing and analyzing the vast quantities of data is a challenging and sometimes impossible task. The process of obtaining and using information is called environmental scanning [1] . Information is available to an organization in many formats and from many sources, including text based documents available on the World Wide Web (WWW). The purpose of this study is to develop a process to scan large amounts of text based data collected from WWW and then classification them in an effective way. Many areas of research were combined to develop this process including the vector space model (VSM) introduced by Salton [2], liner discriminant analysis, environmental scanning [3] and text classification methods (for example Refs [4] [5]). A decision tree [6] is a general data classification method. A brief review of each of areas of researches is provided.