Implementing Logic Wrappers Using XSLT Stylesheets Amelia B˘ adic˘ a University of Craiova, Business Information Systems Department A.I.Cuza 13, Craiova, RO-200585, Romania ameliabd@yahoo.com Costin B˘ adic˘ a, Elvira Popescu University of Craiova, Software Engineering Department Bvd.Decebal 107, Craiova, 200440, Romania {badica costin, popescu elvira}@software.ucv.ro Abstract The Web has become a major source of information that is easily accessible at low cost by individual and business consumers. Logic wrappers are a new technology that was proposed to help automatizing the task of data extraction from the Web. In this note we present an approach for e- ciently implementing logic wrappers with the help of XSLT transformation language. The approach was successfully applied in various application areas: collecting product features from product information sheets and mining travel resources as found on Web sites of online transaction bro- kers. 1. Introduction The Web was designed for human consumption rather than machine processing. Web pages are designed by hu- mans and are targeted to human consumers that seek spe- cialized information in various areas of interest. That in- formation can be reused for dierent problem solving pur- poses; in particular it can be searched, filtered out, pro- cessed, analyzed, and reasoned about. Web data sources are in fact networked electronic doc- uments written in HTML or XML that can be character- ized as neither natural language, nor structured (usually, the term semi-structured data is used to characterize them). Many Web data sources can be nicely abstracted as provid- ing relational data as sets of relational tuples. Examples in- clude: search engines result pages, product catalogues, news This work was carried out as part of the CNCSIS grant 185/2006: ”Technologies and Intelligent Software Tools for Automated Con- struction of E-Catalogues of Products Using Knowledge Acquisition from the Web” sites, product information sheets, travel resources, multime- dia repositories, Web directories, a.o. Logic wrappers (L-wrappers hereafter) are a new tech- nology for constructing wrappers for relational data extrac- tion from the Web. This technology borrows ideas from the areas of logic programming and inductive learning [6, 8]. L-wrappers have a declarative semantics, and therefore their specification is decoupled from their implementation. L-wrappers can be semi-automatically generated using in- ductive logic programming. In this paper we describe an ap- proach for the ecient implementation of L-wrappers using XSLT transformation language ([7]) – a standard language for processing XML documents. The paper is structured as follows. Section 2 introduces L-wrappers and XSLT 0 transformation language. Section 3 describes the algorithm for translating L-wrappers into XSLT 0 programs. Section 4 illustrates the translation on a simple example and briefly discusses a more realistic exper- iment. The last section concludes. 2. Background: L-Wrappers and XSLT 0 2.1. L-Wrappers HTML is the lingua franca for Web publishing. An HTML document can be transformed into an well-formed (i.e tree-structured) XML document expressed in XHTML Therefore, we can safely assume that Web data sources are modeled as labeled ordered trees. We adopt a standard relational model, i.e. we associate to each Web data source a set of distinct attributes. A wrap- per is a program that takes a labeled ordered tree and returns a subset of tuples of extracted nodes. L-wrappers are sets of patterns defined as logic rules that can be learned by apply-