Ontology-based Integration and Retrieval over Multiple Quantities — What if “Ovate leaves and often blue to purple flowers Shenghui Wang Department of Computer Science Vrije Universiteit Amsterdam The Netherlands Jeff Z. Pan Department of Computing Science University of Aberdeen United Kingdom Abstract Information integration and retrieval have been impor- tant problems for many information systems — it is hard to combine multidimensional and parallel information and make them available for application queries. In our pre- vious work [12], we have shown how to use ontologies to facilitate integrating and querying parallel but single di- mensional information. In this paper, we further investigate how to take advantage of ontologies to facilitate integrating parallel information and querying over multiple quantities. 1 Introduction Information integration and retrieval have been impor- tant problems for many information systems, including those based on the Web [9] — it is hard to combine in- formation from different sources and make them available for application queries. In this paper, we focus on descrip- tive domains, where most information is mostly available in natural language (NL) form and comes parallel, i.e., the same objects or phenomena are described in multiple free- styled documents [3]. To some extent, the Web itself is a huge source of parallel descriptions. It has been argued in [13] that NLs are not adept at describing these continu- ous quantities precisely. Therefore, automated information processing in descriptive domains suffers from the lack of techniques to capture the semantics of natural language de- scriptions precisely and represent them properly. Recently, W3C standardised the OWL Web Ontology Language [1] in its Semantic Web Activity. With ontolo- gies being shared understandings of application domains, ontology-based integration and retrieval [10] is a promis- ing direction. In our previous work [12], we have shown how to use ontologies to facilitate integrating and query- ing parallel but single quantity information (shape descrip- tions). More specifically, parallel shape descriptions can be extracted and represented in a uniform ontology, the explic- itly written information can be accessed easily and the im- plicit knowledge can also be deduced naturally by applying reasoning on the whole ontology. In this paper, we further investigate the following issues that are related to how to take advantage of ontology to facilitate integrating parallel information and querying over multiple quantities. Like in [12], we choose botany as our application domain as it is one of the premier descriptive sciences and offers a wealth of material on which to evaluate our approach. In particular, we consider parallel colour and leaf shape descriptions in our ontology, which is an extension of the ones that we used in [11, 12, 13]. For example, the colour of flowers of species Paeonia anomala is described in two floras: purple-pink — in Ornamental Plants From Russia, rose to red, occasionally nearly white — in Flora of China; while its leaf shape is also described differently as lanceolate — in Ornamental Plants From Russia, linear to linear-lanceolate — in Flora of China. Being able to handle each quantity as a separate dimen- sion is simply the first step. With multiple quantities in our ontology, we can ask many interesting questions. For ex- ample, one user may ask the plant knowledge base: which species definitely have “linear” leaves and more or less “bluish-purple” flowers, blooming in early spring across the British Isles? English bluebell satisfies this query, but are there any other species having similar morphological fea- tures? The contributions of this paper include solutions to the following issues related to multidimensional integration and querying: 1. We focus on the semantics of natural language de- scriptions with frequency information, such as “some- times,” “rarely,” etc. and its representation in an ontol- ogy system. 2007 IEEE/WIC/ACM International Conference on Web Intelligence 0-7695-3026-5/07 $25.00 © 2007 IEEE DOI 10.1109/WI.2007.63 388 2007 IEEE/WIC/ACM International Conference on Web Intelligence 0-7695-3026-5/07 $25.00 © 2007 IEEE DOI 10.1109/WI.2007.63 388 2007 IEEE/WIC/ACM International Conference on Web Intelligence 0-7695-3026-5/07 $25.00 © 2007 IEEE DOI 10.1109/WI.2007.63 388 2007 IEEE/WIC/ACM International Conference on Web Intelligence 0-7695-3026-5/07 $25.00 © 2007 IEEE DOI 10.1109/WI.2007.63 388 2007 IEEE/WIC/ACM International Conference on Web Intelligence 0-7695-3026-5/07 $25.00 © 2007 IEEE DOI 10.1109/WI.2007.63 388