Saumya Chandrashekhar Suvarna Computer Science Department Vellore Institute of Technology suvarna.saumyacjyoti2014@vit.ac.in Mashrin Srivastava Computer Science Department Vellore Institute of Technology mashrin.msrivastava2014@vit.ac.in Prof. B Jaganathan Mathematics Department Vellore Institute of Technology jaganathan.b@vit.ac.in Dr. Pankaj Shukla Mathematics Department Vellore Institute of Technology pankaj.shukla@vit.ac.in Abstract—The purpose of the research is to find a centrality measure that can be used in place of PageRank and to find out the conditions where we can use it in place of PageRank. After analysis and comparison of graphs with a large number of nodes using Spearman’s Rank Coefficient Correlation, the conclusion is evident that Eigenvector can be safely used in place of PageRank in directed networks to improve the performance in terms of the time complexity. Keywords— PageRank, Centrality, Eigenvector, Webgraph I. INTRODUCTION In today's era of computer technology with the vast usage of the World Wide Web, users want a fast and accurate answer. It is the need of the hour to explore every avenue for quicker results and better time complexity. Thus a constant re- evaluation of existing algorithms is necessary. Centrality measures in any kind of network highlights the important parts of that network. In dense networks, there is a potential for more importance as compared to sparse networks. Ranking is essential in terms of finding the most important or influential nodes in a network. II. EXISTING ALGORITHMS [1] Degree centrality selects the most important node based on the principle that an important node is involved in a lot of interactions. However degree centrality doesn’t take into account the importance of the nodes it is connected to. Closeness centrality decides the importance of the nodes according to the concept that important nodes will be able to communicate quickly with the nodes. Betweenness centrality finds the most important node based on the theory that if a node is important it will lie on the path between two nodes. Betweenness and closeness centrality is mostly redundant when it comes to ranking of webpages as it attaches no proper importance to the relevant webpages. Degree centrality is relevant however it doesn’t take into account the importance of the node pointing to it hence it gives ample opportunity for malicious and inauthentic websites to be ranked first. Apart from the abovementioned algorithms, other algorithms to find the ranking of a webpage include A. The Existing Algorithms in Use Eigentrust Algorithm [2] : Peer-to-peer network consists of partitioning tasks between various peers such that each peer is equally privileged. This method is popular for sharing information but due to its open source nature, the spread of inauthentic files is easy. Eigentrust Algorithm assigns a rank to each peer based on the past uploads made by the peer in an attempt to reduce the rank of inauthentic or malicious peers trying to pose as peers distributing authentic files. SimRank: There are many areas where the similarity of objects or interests come into play. The most obvious one being the use to find similar documents on the World Wide Web. Another is to group people with similar interests together, for example the recommended friends list by social media sites. SimRank measures the similarity between two objects based on the basis of the objects that they are referenced by. TrustRank [3] : Web spam pages are created in order to mislead the search engines by using various techniques to get a higher rank. TrustRank involves manually identifying a set of authentic websites known as seed pages and sending a crawl to identify pages similar to the seed pages. In Anti-Trust Rank [4] unauthentic or malicious websites are found out and websites close to that are non-trustworthy websites. Trust rank is decreased as it moves father away from the seed site. B. Algorithms Used By GOOGLE Google is the most popular search engine used. The reason being that it is able to provide relevant search results quickly. The very first algorithm used by Google was PageRank algorithm by Larry page and Sergey Brin. However incidents like Google bombing, where a search for a topic can lead to the webpages of a seemingly unrelated topic prompted tweaks in algorithms used by Google. Google bombs can be a done for business, political or comical reasons. It works on the principle of spamdexing, which is the manipulation of the index used by the search engines and heavy linking of websites. In order to reduce incidents like Google bombing, Google uses many algorithms along with the original PageRank algorithm to ascertain the final ranking of a webpage. These algorithms include- Google Panda: Authority is one of the key measurements of ranking in search engines. The trust can be measured by the authority of the link to the article, where the more a link is authoritative, the more the article it points to can be trusted. Google Panda is essentially a filter of the content quality, depreciating low quality websites including websites that have unoriginal or redundant content, contains a huge number of advertisements, have content that is not written or authorized by an expert. If the panda rank score of a site is high, the website gets ‘Pandified’ which means that the pages in the site are imposed with a penalty. Google Hummingbird: Google Hummingbird attempts to judge the intent of the person making the query ie. to take into consideration the meaning of the sentence as a whole as compared to certain keywords. It is said that Google uses the vast database of information or the ‘Knowledge graph’ to determine the best results. Google Penguin: Google Penguin assigns a penalty to the webpages which are involved in the usage of black hat Search Engine Optimization techniques or violation of Google’s webmaster guidelines. The goal is to do away with keyword stuffing, link building, meaningless and irrelevant content and doorway pages or paying for links to the website. III. ALGORITHMS FOR COMPARISON A. Existing: PageRank Algorithm PageRank [5] centrality defines the importance of nodes based on the number and quality of nodes connected to it. The World Wide Web can be represented as a directed graph in which every webpage is represented by a node and the edges pointing to a node represents the links pointing to a webpage and the edges pointing away from the node represents the links that are pointing towards other websites. The most relevant page will be decided not only by the in-degree but PageRank Algorithm using Eigenvector Centrality- New Approach