0018-9162/02/$17.00 © 2002 IEEE July 2002 11
COVER FEATURE
A Random Walk Down
the Genomes: DNA
Evolution in Valis
W
hile chemistry and physics are the
substrate of biology, researchers now
believe that a better understanding of
biology will come through informa-
tion-theoretic studies of genomes,
providing new insights into DNA’s role in govern-
ing metabolic and regulatory pathways. Conse-
quently, the mathematical approaches derived from
systems sciences—dynamical systems, control the-
ory, game theory, information and decision theory,
and mathematical logic—are playing increasingly
important roles in biological research.
Understanding the evolutionary processes that
act on these “codes of life”—including point muta-
tion, recombination, gene conversion, replication
slippage, DNA repair, translocation, imprinting,
and horizontal transfer—requires the ability to ana-
lyze vast amounts of continually generated genomic
data. The challenges, intrigues, and excitement that
these genomic sequences have come to symbolize
have catapulted the embryonic field of bioinfor-
matics to the forefront.
Bioinformatics currently consists of a set of tools
to “contig” genomic sequences and organize, anno-
tate, and search sequence databases and generate
computationally or statistically intriguing problems.
However, researchers in this emerging discipline
require more complex mechanisms to investigate
the full ensemble of available biological facts.
VALIS
To meet this challenge, New York University’s
Bioinformatics Group is creating a computational
environment—the vast active living intelligent sys-
tem—designed to solve the immediate genomic and
proteomic problems that the biological community
currently faces but flexible enough to adapt to the
maturing bioinformatics field. Inspired by Philip
K. Dick’s 1981 science fiction novel, Valis (http://
bioinformatics.cat.nyu.edu/valis/) envisions a mod-
ern biology driven by large-scale processing of het-
erogeneous data from diverse sources as well as
sophisticated algorithms to extract meaningful
information and suggest new experiments, either
to validate old data or resolve ambiguities.
Individual researchers have already written some
of these algorithms, but the resulting tools usually
depend on many specifics of internal data repre-
sentation, different assumptions about the nature
of the data, and idiosyncratic visualization and
manipulation schemes. Current data sources range
from GenBank genomic sequences to results of indi-
vidual microarray experiments. Interfaces to these
data sources vary widely as well, with a concomi-
tant increase in complexity. Further, the present
trend of ad hoc algorithm development leads to lit-
tle code sharing.
Valis is a language-independent environment for
prototyping bioinformatics applications that pro-
vides
• a set of libraries to read input data stored in
relational databases or standard file formats,
• efficient implementations of algorithms useful
to genomics, and
• numerous visualization tools.
The authors propose a new software system, Valis, that incorporates
biological data and domain-specific knowledge and show how biologists
can use it to model, analyze, and experiment with genomic evolutionary
processes.
Salvatore
Paxia
Archisman
Rudra
Yi Zhou
Bud Mishra
New York
University