[Bhosale*, 4(8): August, 2015] ISSN: 2277-9655 (I2OR), Publication Impact Factor: 3.785 http: // www.ijesrt.com© International Journal of Engineering Sciences & Research Technology [239] IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AUTOMATIC ANNOTATION OF QUERY RESULTS FROM DEEP WEB DATABASE Chaitanya Bhosale * , Prof. Sunil Rathod * Department of Computer Engineering, Dr. D.Y. Patil School Of Engineering Lohgaon, Pune, India. ABSTRACT In recent years, web database extraction and annotation has received more attention from the database. When search query is submitted to the interface the search result page is generated. Search Result Records (SRRs) are the result pages obtained from web database (WDB) and these SRRs are used to display the result for each query. Every SRRs contains multiple data units similar to one semantic. These search results can be used in many web applications such as comparison shopping, data integration, metaquerying. But to make these applications successful the search pages are annotated in a meaningful fashion. To reduce human efforts, an automatic annotation approach is used. In which, we first aligns the data units on result records into various groups such that the information in the similar group have same meaning. After this we annotate each and every group in different domains and obtain the final annotation after aggregating them. In addition, we use New CTVS technique for extraction of QRRs from a query result page, in which we use optional labeling and dynamic tagging for the improvement. Then an annotation wrapper is generated automatically which is used for annotation new result records from the same web database. KEYWORDS: Data alignment, data annotation, web database,wrapper generation,Information Integration,Search Result Records. INTRODUCTION Databases are known technologies for managing large amount of data. World Wide Web is a good way of presenting information. Alignment and annotation of data increases the quality of searching and updating data. Data alignment is the way of arranging data and accessing in computer memory. Data annotation is the methodology for adding extra information to a document, a word or phrase, paragraph or the entire document. In other words data unit annotation is the process of assigning meaningful labels to data. For example, a folder in a computer system labeled as “Trip-2015” might hold files of photographs taken in trip. The automatic annotation solution as mentioned by authors of [1] consists of three phases- Alignment phase, Annotation phase, and Annotation wrapper generation phase. The alignment phase organizes all data units according to different groups where each group represents different concepts. The annotation phase groups the data to produce a meaningful label to every data units. The annotation rules are generated in annotation wrapper generation phase. The solution also uses six basic annotators; where each annotator can independently assign labels to data units. Two main concepts primary used for annotation research are data units and text nodes. Data unit is a piece of text that defines one concept of real world entity, although the various table text styles are provided. The formatter will need to create these components, incorporating the applicable criteria that follow. Dynamically for human browsing these data units are encoded into the result page and assigned meaningful labels. Human efforts are required to annotate the data units. Thus, lack in scalability. To overcome this, automatic assigning of data units within the SRRs is required. An automatic annotation approach that first arrange all data into different groups i.e. inside the same group have same meaning and then each group is annotated in different aspects and aggregated to predict a final annotation. Finally, wrapper is generated. Wrappers are commonly used as translators which annotate new result records from the similar web database. This automatic annotation approach is scalable and highly effective. A clustering based shifting technique is proposed to align the data units into different groups.