Semantic Web Rules for Business Information Anna Maclachlan Idilia, Inc. Montreal, Quebec, Canada email: anna.maclachlan at idilia.com Harold Boley Institute for Information Technology Centre for e-Business National Research Council of Canada Fredericton, New Brunswick, Canada email: Harold.Boley at nrc-cnrc.gc.ca ABSTRACT A description of the New Brunswick Business Knowl- edge Base (NBBizKB) is provided and is made available online in RuleML. NBBizKB realizes a two-step design. First, business facts are extracted, once from static CSV ta- bles and, repeatedly from dynamic semi-structured HTML pages. Second, Semantic Web rules are developed to de- rive information implicit in the fact base. Fact extraction comprises an XML DTD design, CSV-to-XML conversion, HTML mining, and XSLT translations. Rule derivation employs the Java-based RuleML implementation of OO jDREW to perform data validation, classification mapping, and information integration. Quantitative rule derivation results and findings about the original business data are reported. This rule-based reasoning over extracted facts about New Brunswick business comprises both a case study in business information mining and a use case for Semantic Web rules. KEY WORDS Web knowledge bases. Data mining. Rule-based reason- ing. Taxonomy alignment. Semantic Web rules. RuleML. 1 Introduction Many areas of e-Business such as product catalog search can take advantage of the Semantic Web, which has gained strong momentum since its start as a W3C Activ- ity [http://www.w3.org/2001/sw]. The goal of the Semantic New Brunswick initiative of our Semantic Web Lab is making Semantic Web and AI techniques available to business an- alysts, venture capitalists, and entrepreneurs in a particu- lar region, namely the province of New Brunswick (NB) in Canada. The aim of our project within this initiative is to build tools for regional business analysis and for im- proving semantically-based business search via taxonomies and rules. It resulted in the NB Business Knowledge Base (NBBizKB) described in this paper and related tools (ap- pendix A). The utility of semantic tools is apparent in the fact that there are existing resources available on the Web con- taining business information that we can leverage. One can manually compare NB enterprises of a specific size or in a specific industry sector or in a specific geographic area of the province by consulting the “Biznet Directory of Man- ufacturers and Selected Services to Industry” (henceforth Biznet), available online. One can also find the contact de- tails for almost any business in the province in the form of the Yahoo! Canada Business Finder (henceforth Yahoo!). NBBizKB exploits the semantics implicit in these sources and goes beyond the intent of the original sources, thereby making already useful resources accesible in novel ways. NBBizKB is implemented in two steps. The first step is the extraction of a fact base in Object-Oriented RuleML [3] format from the two distinct Web sources. The ex- traction process as described in section 2 requires transfor- mation from the native format of the sources using XSLT, DTD design and HTML mining. The second step is the realization of rules, again in RuleML format, that operate over the two kinds of facts. As described in section 3, this was achieved by analyzing the fact base and devising rules useful in the context of our collections and relevant to our goal of providing semantic business analysis tools for the region. NBBizKB has already served two intertwined pur- poses, helping in the development of several generations of our software: (1) Case study in business information mining, pre- senting our end-to-end methodology for generating the NBBizKB facts. Since databases like the NB Biznet Di- rectory exist for many regions and the Yahoo! Business Finder has a wide coverage, our case study can be trans- ferred to other places. With deductive techniques from (2), several regional knowledge bases can then be integrated in order to proceed to a global scale. (2) Use case for Semantic Web rules in an e- Business environment, demonstrating the use of RuleML’s RDF-inspired, XML-based facts and rules, contributing to the preparation of a W3C Workshop [http://www.w3.org/2004/12/rules-ws/cfp]. Since NBBizKB en- hances the many large, nested facts from (1) by complex rules, it has been used to benchmark indexing techniques, translators, etc. (in implementations such as F-logic [12], TRIPLE [10], and jDREW [2], [11]). With inductive techniques, additional rules can be generated from the facts. These purposes are reflected by the structure of the main sections of this paper: Section 2 describes (2.1) the transition from Biznet’s