International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 04 | Apr 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 3610
Calculating Rank of Web Documents Using Its Content and Link
Analysis
Amit Kumar
1
, Anshita Bhardwaj
2
, Anshika Jain
3
, Mr. Jagbeer Singh
4
1,2,3,4 -
Department of Computer Science and Engineering, Meerut Institute of Engineering and Technology, Meerut-
250005, Uttar Pradesh, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - On the World Wide Web (www), when a query is
searched by the user over a search engine, ranking is the way
through which the importance of web pages is measured by a
search engine. In today’s scenario, all the vital information is
available online in the form of text documents. Various search
engines are available for mining this available information,
according to the user query, and giving appropriate and most
relevant results to the user following his/her query. Search
engines retrieve and show the documents according to their
ranking. There are many search engines following page
ranking for assignment of the weightage to the website’s
pages. In this paper, content-based matching is done along
with the page ranking on hyperlink evaluation to display more
accurate and relevant results following the user query.
Key Words: Hyperlink evaluation, Ranking, Search engine,
Search query, content-based.
1. INTRODUCTION
Nowadays, the Page-Rank method is mostly used in biblio-
metrics[7], information networks, social analysis, and link
prediction. It is also used for systems analysis of road
networks and in Science, and neuroscience. The main factor
is that it does not matter how long the query is, the answer
will always come out in a particular order of links. Page-
Rank seems very simple. But when a simple calculation is
applied thousands or millions of times over the results can
seem complicated. The main purpose of this paper is to
provide an effective way to get the query result by using very
simple code for clarity and understanding. The future work
for starters can be, that we need to optimize our method by
creating what our target audience wants to see. This will
attract links better than anything else.
A search query is a string of words a user enters in the
search box, and then the search engine gives the response
within sub-seconds. A search engine is an online application
that gets a query input from the user and based on the
keywords or catchphrases received by the user, it fetches the
results by online crawling [8] the websites with the help of
crawlers or spiders, and then sorts them to make a list of
hyperlinks corresponding to the matched documents.
In this paper, Along with the content-based matching, page
ranking on hyperlink evaluation is done to display more
accurate and relevant results following the user query. First,
we have fetched out the links along with the content present
inside the topmost text documents and pasted them inside a
dictionary to evaluate a score to give the most relevant
webpage, then the score is calculated for every document
and a tagged score is assigned to each of them. After that, the
highest score is found to get the best top pages reordered to
improve user-fetched results on the search engine.
The responsive sequence of lists is also known as the Search
Engine Result Page(SERP). The sequence of responses
provided by search engines may consist of a mix of videos,
images, articles, web pages, and many other types of files.
The ranking of Web pages returned in response to a user
query combines a measure of the relevance of the page to the
query together with a query-independent measure of the
quality of the page. The objective of this project is to reduce
the uncertainty and un-usefulness of the web pages that
come up at the top of the desired results by using both link
and content analysis.
2. BACKGROUND HISTORY
The web pages shown at the top of the search results by the
search engine are at times unwanted or useless for the user
through certain practices. Mainly, web document retrieval
has three types which are explained as:
2.1 Organic Search
Organic search is termed as the search methodology by
which the search pages are retrieved through the search
engine’s algorithm. In the search engine's algorithmic test,
web pages scoring exceptionally well are generally
containing algorithms based upon factors such as quality and
suitability of the content, specialization/expertise,
authoritativeness, and trustworthiness of the website and its
respective content writer on the given topic. Usually, the
organic search results are the ones which are unpaid results
appearing extensively over a search engine when the results
page are popped up after the query gets searched by the
user. For the sake of a relevant example, when user types
"South Indian food" in any search engine, say, Google, there
are all the unpaid results flashing which are all a part of the
organic search. Commonly, people tend to view and open up
the topmost results on the first page of all the search results.
Each page of the search engine results, usually contains 10
organic listings[1,2], however, some results pages may have