Abstract—recently, the classification of the linked web pages is important to various tasks in World Wide Web information retrieval system. The unrestrained environment of web pages offerings, further tasks to web page classification as equated to modern methods of text classification. In this study, we use latent semantic analysis (LSA) and k-Nearest Neighbor (k-NN) approaches for classification of linked web page documents. LSA attempts to increase the efficiency of classification algorithm by concentrating on the semantic meaning of web pages. In addition, LSA withthe statistical model of aweb page usage document related to Eigen vector decomposition and factor exploration. This serves the transformation of the original linked web document into a new document matrix and used for classification task. k-NN used to perform the nearest neighbor classification based on closed neighbors, using similarity measures in linked web pages. The case study discussed of proposed algorithm and it shows that LSA transform affects the results of linked web page documents that the permitthe better illustration of the linked web page documents semantics. The proposed approach achieved higher accuracy and effectiveness of classification algorithm by focusing on semantic meaning. Index Terms—Classification, Latent semantic analysis, Linked web pages, k-nearest neighbors. I. INTRODUCTION ITH the rapid development of the web, Linked data becomes ubiquitous in everydayweb applications for example, tweets in Twitter data, social networks in Facebook and protein interaction networks [1]. The Linked web is the most important data source for web information retrieval system. The increased size of the linked web pages does not stop the growing day-by-day scenario. However, Linked web page classification becomes essential due to this vast amount of data. The Linked web page data is a set of the greatest practices for distributing and relating structured data on the web, this definition given by Tim Berbers. Lee et al., [2]proposed four types of rules for machine- readable content on the web. The Linked data creativity has given rise to a collective number of Resource Description Framework (RDF) documents as well as other machine G.Naga Chandrika was with Dept. of Information Technology, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad-90, India. (e-mail: nchandrika_g@vnrvjiet.in). Prof. E.Srinivasa Reddy was with the Department of Computer Science and Engineering, Acharya Nagarjuna University, Guntur, A.P, India. (e-mail: esreddy67@gmail.com). readable sources [3][4]. Linked web data holds information about a resource and relations to other linked resources [3]. The Linked data providesaplatform for adding structured data available on the current web and its lead to the addition of the web by using aworldwide dataspace that connects data from different domains [4]. Linked data page document faces various issues and challenges in the implementation process due to various limitations such as the data may be available in different storage formats and organization of those data items into Linked data documents to adopt the different scientific domains [5][6]. This can also support different semantics to publish the data to right protection mechanisms [6]. Today’s web pages contains of a large number of features such as URLs, HyperText Markup Language (HTML) or Extensible Markup Language (XML) tags, Hyperlinks and other text document contents that should be measured for the automatic classification process [7]. However, these features not supported by the web page documents because of this property of web pages, web classificationsare different from traditional text document classification [8]. Current web classification used for the topic-specific Linked web data selection and analysis of the up-to-date structure of the web, web directories and clawers [8]. Previously, users are manually constructed web directories such as Yahoo! to allocate class labels to the web documents [7]. One of the major issues of Linked data classification is the curse dimensionality of the feature space of web documents. In this paper, we used latent semantic analysis (LSA) and k- NN approaches for classification of Linked web pages. LSA attempts to increase the efficiency of classification algorithm by focusing on the semantic meaning of Linked web page documents. LSA serves the transformation of the original- linked web document into a new document matrix and used for classification task. The k-NN used to perform the nearest neighbor classification based on the nearest neighbors using similarity measures in linked web pages. Finally, case study shows that LSA transforms affects the results of linked web page documents that enable better representation of the linked web page document’s semantics. The proposed approach achieved higher accuracy and effectiveness of classification algorithm by focusing on semantic meaning. II. RELATED WORKS This section, we discuss a brief survey of state-of-the-art Linked web page classification methods. Linked data has been Latent Semantic Analysis and Nearest Neighbor Classification Algorithm for Linked Web Pages G.Naga Chandrika, Prof. E. Srinivasa Reddy W International Journal of Computer Science and Information Security (IJCSIS), Vol. 15, No. 6, June 2017 304 https://sites.google.com/site/ijcsis/ ISSN 1947-5500