Computational Biology and Language Madhavi Ganapathiraju 1 , Narayanas Balakrishnan 2 , Raj Reddy 3 , and Judith Klein-Seetharaman 4 1 Carnegie Mellon University, USA madhavi+@cs.cmu.edu 2 Indian Inst. of Science, India & Carnegie Mellon Univ, USA balki@serc.iisc.ernet.in 3 Carnegie Mellon University, USA rr+@cmu.edu 4 Carnegie Mellon University & University of Pittsburgh, USA judithks@cs.cmu.edu 1 Introduction Current scientific research is characterized by increasing specialization, accu- mulating knowledge at a high speed due to parallel advances in a multitude of sub-disciplines. Recent estimates suggest that human knowledge doubles every two to three years – and with the advances in information and communica- tion technologies, this wide body of scientific knowledge is available to anyone, anywhere, anytime. This may also be referred to as ambient intelligence - an en- vironment characterized by plentiful and available knowledge. The bottleneck in utilizing this knowledge for specific applications is not accessing but assimilating the information and transforming it to suit the needs for a specific application. The increasingly specialized areas of scientific research often have the common goal of converting data into insight allowing the identification of solutions to scientific problems. Due to this common goal, there are strong parallels between different areas of applications that can be exploited and used to cross-fertilize different disciplines. For example, the same fundamental statistical methods are used extensively in speech and language processing, in materials science appli- cations, in visual processing and in biomedicine. Each sub-discipline has found its own specialized methodologies making these statistical methods successful to the given application. The unification of specialized areas is possible because many different problems can share strong analogies, making the theories devel- oped for one problem applicable to other areas of research. It is the goal of this paper to demonstrate the utility of merging two disparate areas of applications to advance scientific research. The merging process requires cross-disciplinary collaboration to allow maximal exploitation of advances in one sub-discipline for that of another. We will demonstrate this general concept with the specific example of merging language technologies and computational biology. Communication between researchers in these disparate fields is facilitated through use of analogies. Specifically, the analogy between words and their mean- ing in speech and language processing on one hand, and the mapping between Y. Cai (Ed.): Ambient Intelligence for Scientific Discovery, LNAI 3345, pp. 25–47, 2005. c Springer-Verlag Berlin Heidelberg 2005