0018-9162/02/$17.00 © 2002 IEEE July 2002 11 COVER FEATURE A Random Walk Down the Genomes: DNA Evolution in Valis W hile chemistry and physics are the substrate of biology, researchers now believe that a better understanding of biology will come through informa- tion-theoretic studies of genomes, providing new insights into DNA’s role in govern- ing metabolic and regulatory pathways. Conse- quently, the mathematical approaches derived from systems sciences—dynamical systems, control the- ory, game theory, information and decision theory, and mathematical logic—are playing increasingly important roles in biological research. Understanding the evolutionary processes that act on these “codes of life”—including point muta- tion, recombination, gene conversion, replication slippage, DNA repair, translocation, imprinting, and horizontal transfer—requires the ability to ana- lyze vast amounts of continually generated genomic data. The challenges, intrigues, and excitement that these genomic sequences have come to symbolize have catapulted the embryonic field of bioinfor- matics to the forefront. Bioinformatics currently consists of a set of tools to “contig” genomic sequences and organize, anno- tate, and search sequence databases and generate computationally or statistically intriguing problems. However, researchers in this emerging discipline require more complex mechanisms to investigate the full ensemble of available biological facts. VALIS To meet this challenge, New York University’s Bioinformatics Group is creating a computational environment—the vast active living intelligent sys- tem—designed to solve the immediate genomic and proteomic problems that the biological community currently faces but flexible enough to adapt to the maturing bioinformatics field. Inspired by Philip K. Dick’s 1981 science fiction novel, Valis (http:// bioinformatics.cat.nyu.edu/valis/) envisions a mod- ern biology driven by large-scale processing of het- erogeneous data from diverse sources as well as sophisticated algorithms to extract meaningful information and suggest new experiments, either to validate old data or resolve ambiguities. Individual researchers have already written some of these algorithms, but the resulting tools usually depend on many specifics of internal data repre- sentation, different assumptions about the nature of the data, and idiosyncratic visualization and manipulation schemes. Current data sources range from GenBank genomic sequences to results of indi- vidual microarray experiments. Interfaces to these data sources vary widely as well, with a concomi- tant increase in complexity. Further, the present trend of ad hoc algorithm development leads to lit- tle code sharing. Valis is a language-independent environment for prototyping bioinformatics applications that pro- vides a set of libraries to read input data stored in relational databases or standard file formats, efficient implementations of algorithms useful to genomics, and numerous visualization tools. The authors propose a new software system, Valis, that incorporates biological data and domain-specific knowledge and show how biologists can use it to model, analyze, and experiment with genomic evolutionary processes. Salvatore Paxia Archisman Rudra Yi Zhou Bud Mishra New York University