Seed-based Generation of Personalized Bio-Ontologies for Information Extraction Cui Tao ⋆ and David W. Embley * Department of Computer Science, Brigham Young University, Provo, Utah 84602, U.S.A. Abstract. Biologists usually focus on only a small, individualized, sub- domain of the huge domain of biology. With respect to their sub-domain, they often need data collected from various different web resources. In this research, we provide a tool with which biologists can generate a sub-domain-size, user-specific ontology that can extract data from web resources. The central idea is to let a user provide a seed, which con- sists of a single data instance embedded within the concepts of interest. Given a seed, the system can generate an extraction ontology, match information with the user’s view based on the seed, and collect informa- tion from online repositories. Our initial experimentations indicate that our prototype system can successfully match source data with an ontol- ogy seed and gather information from different sources with respect to user-specific, personalized views. 1 Introduction To do activities such as performing background research for a field of study, gaining insights into relationships and interactions among different research dis- coveries, or building up research strategies inspired by other’s hypotheses, biol- ogists often need to search several online databases and gather information of interest. Biologists usually have to traverse different web sources and collect the data of interest manually. This task is a tedious and time-consuming. It would be beneficial if we could generate a data-extraction ontology specif- ically for each individual user that would automatically collect the information of interest. But generating an ontology, especially an ontological description for an information repository, is non-trivial; it not only requires domain expertise, but also requires knowledge of specific ontology language. Data heterogeneity and different user objectives makes the task even more daunting. To illustrate the difficulties biologists encounter in gathering information from a variety of sources and also to illustrate the challenges involved in build- ing an extraction ontology to automatically collect data, consider some examples. For chromosome location of a gene, some users might only care about the chro- mosome on which this gene is located. Other users might care about a more detailed location like the start and end base pairs. Sources, not knowing user ⋆ Supported in part by the National Science Foundation under Grant #0414644.