A Hybrid Keyword Search across Peer-to-Peer Federated Databases Jungkee Kim 1,2 and Geoffrey Fox 2 1 Department of Computer Science, Florida State University, Tallahassee FL 32306, U.S.A., jungkkim@cs.fsu.edu, 2 Community Grids Laboratory, Indiana University, Bloomington IN 47404, U.S.A. gcf@indiana.edu Abstract. The need for Keyword search in databases is suggested both by Web integration with legacy database management system and by dynamic Web publication. However, it sacrifices the inherent meaning of database schema. Web search engines provide clues for resource location on the Web, but have similar semantic problems. The Semantic Web suggests an ideal solution for the semantic problem on the Web. But due to the need for sophisticated domain definition and lack of unified definitions, many Web pages are not part of the Semantic Web. We define a hybrid search to be a search combining semantic metadata and keywords. A hybrid search on P2P based federated databases provides meaningful and scalable search on an overlay network across the Internet. This paper describes the design of the combined search for unstructured data with associated metadata, information retrieval from the repository, peer-to-peer based communication layer, and data integration hub. 1 Introduction Since the Internet was introduced as a communication and resource sharing environment, there have been many efforts to utilize its enormous and rapidly proliferating resources. Keyword search in databases [1, 15] is one response to the new Internet environment from the database society. Both Web integration with legacy database management systems, and dynamic Web publication through the embedded databases, strongly benefit from keyword search capability on the databases. This is because conventional queries on databases require knowledge of the schema to extract target information. Additionally, semistructured schema like XML has been recently assimilated to the Web and databases. These make it more complicated to produce a proper inquiry. Though the keyword based search simplifies the search on the database, it loses the inherent meaning of the schema. Therefore, the keyword search usually does not return results based on semantic criteria. A Web search engine is a typical example for use of the Internet. In 2004, Google Search [5]—one of most famous search engines—reaches more than 4