Seed-based Generation of Personalized Bio-Ontologies for Information Extraction Cui Tao ⋆ and David W. Embley * Department of Computer Science, Brigham Young University, Provo, Utah 84602, U.S.A. Abstract. Biologists usually focus on only a small, individualized, sub- domain of the huge domain of biology. With respect to their sub-domain, they often need data collected from various diﬀerent web resources. In this research, we provide a tool with which biologists can generate a sub-domain-size, user-speciﬁc ontology that can extract data from web resources. The central idea is to let a user provide a seed, which con- sists of a single data instance embedded within the concepts of interest. Given a seed, the system can generate an extraction ontology, match information with the user’s view based on the seed, and collect informa- tion from online repositories. Our initial experimentations indicate that our prototype system can successfully match source data with an ontol- ogy seed and gather information from diﬀerent sources with respect to user-speciﬁc, personalized views. 1 Introduction To do activities such as performing background research for a ﬁeld of study, gaining insights into relationships and interactions among diﬀerent research dis- coveries, or building up research strategies inspired by other’s hypotheses, biol- ogists often need to search several online databases and gather information of interest. Biologists usually have to traverse diﬀerent web sources and collect the data of interest manually. This task is a tedious and time-consuming. It would be beneﬁcial if we could generate a data-extraction ontology specif- ically for each individual user that would automatically collect the information of interest. But generating an ontology, especially an ontological description for an information repository, is non-trivial; it not only requires domain expertise, but also requires knowledge of speciﬁc ontology language. Data heterogeneity and diﬀerent user objectives makes the task even more daunting. To illustrate the diﬃculties biologists encounter in gathering information from a variety of sources and also to illustrate the challenges involved in build- ing an extraction ontology to automatically collect data, consider some examples. For chromosome location of a gene, some users might only care about the chro- mosome on which this gene is located. Other users might care about a more detailed location like the start and end base pairs. Sources, not knowing user ⋆ Supported in part by the National Science Foundation under Grant #0414644.