[Bhosale*, 4(8): August, 2015] ISSN: 2277-9655
(I2OR), Publication Impact Factor: 3.785
http: // www.ijesrt.com© International Journal of Engineering Sciences & Research Technology
[239]
IJESRT
INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH
TECHNOLOGY
AUTOMATIC ANNOTATION OF QUERY RESULTS FROM DEEP
WEB DATABASE
Chaitanya Bhosale
*
, Prof. Sunil Rathod
*
Department of Computer Engineering, Dr. D.Y. Patil School Of Engineering
Lohgaon, Pune, India.
ABSTRACT
In recent years, web database extraction and annotation has received more attention from the database. When search
query is submitted to the interface the search result page is generated. Search Result Records (SRRs) are the result
pages obtained from web database (WDB) and these SRRs are used to display the result for each query. Every SRRs
contains multiple data units similar to one semantic. These search results can be used in many web applications such
as comparison shopping, data integration, metaquerying. But to make these applications successful the search pages
are annotated in a meaningful fashion. To reduce human efforts, an automatic annotation approach is used. In which,
we first aligns the data units on result records into various groups such that the information in the similar group have
same meaning. After this we annotate each and every group in different domains and obtain the final annotation
after aggregating them. In addition, we use New CTVS technique for extraction of QRRs from a query result page,
in which we use optional labeling and dynamic tagging for the improvement. Then an annotation wrapper is
generated automatically which is used for annotation new result records from the same web database.
KEYWORDS: Data alignment, data annotation, web database,wrapper generation,Information Integration,Search
Result Records.
INTRODUCTION
Databases are known technologies for managing
large amount of data. World Wide Web is a good
way of presenting information. Alignment and
annotation of data increases the quality of searching
and updating data. Data alignment is the way of
arranging data and accessing in computer memory.
Data annotation is the methodology for adding extra
information to a document, a word or phrase,
paragraph or the entire document. In other words data
unit annotation is the process of assigning meaningful
labels to data. For example, a folder in a computer
system labeled as “Trip-2015” might hold files of
photographs taken in trip.
The automatic annotation solution as mentioned by
authors of [1] consists of three phases- Alignment
phase, Annotation phase, and Annotation wrapper
generation phase. The alignment phase organizes all
data units according to different groups where each
group represents different concepts. The annotation
phase groups the data to produce a meaningful label
to every data units. The annotation rules are
generated in annotation wrapper generation phase.
The solution also uses six basic annotators; where
each annotator can independently assign labels to
data units. Two main concepts primary used for
annotation research are data units and text nodes.
Data unit is a piece of text that defines one concept of
real world entity, although the various table text
styles are provided. The formatter will need to create
these components, incorporating the applicable
criteria that follow.
Dynamically for human browsing these data units are
encoded into the result page and assigned meaningful
labels. Human efforts are required to annotate the
data units. Thus, lack in scalability. To overcome
this, automatic assigning of data units within the
SRRs is required. An automatic annotation approach
that first arrange all data into different groups i.e.
inside the same group have same meaning and then
each group is annotated in different aspects and
aggregated to predict a final annotation. Finally,
wrapper is generated. Wrappers are commonly used
as translators which annotate new result records from
the similar web database. This automatic annotation
approach is scalable and highly effective. A
clustering based shifting technique is proposed to
align the data units into different groups.