Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
Knowledge discovery in a genetic database: The MINOS system.
H. Ripoche J. Sallantin
LIRMM, UMR 9928 CNRS - Montpellier II,
161 rue Ada, F-34392 Montpellier.
E-mail: { hrjs}@lirmm.fr
Abstract
This paper concerns the management of genetic se-
quences in an object-oriented database and the extrac-
tion of knowledge from these sequences. In our case
knowledge discovery consists in finding functions ca-
pable of predicting properties about genetic sequences.
This problem is also known as funciional znference.
Th.e paper is divided in two parts: ihe first one shows
the interesf of using an object-oriented query language
to build and use prediction junctzons. In the second
part, we propose to ‘use prediction functions as de-
scriptors of sequences in order to index them. The
indexation is perform.ed with con,cept lattices [1’7].
Keywords: Machine Learning/Discovery in Large
Databases, Interactive Data Exploration and Diacoo-
erg, Re-use of Discovered Knowledge, Object-Oriented
Databases, Concept Lattices, Genetic Sequences.
1 Introduction: Discovery through
data exploration and data criticism
We think that scientific discovery requires methods
that progressively gather and analyse heterogeneous
information through an interaction with human ex-
perts. As a consequence, we need a knowledge man-
agement system capable of guiding the exploration of
t,he expert by ernphasizing the inner relationship be-
tween data. Thus the knowledge management system
should facilitate data comparison and criticism.
In this paper, we define an environment that
helps a gradual and complete exploration of a se-
quence database. This kind of operation is also called
Database Mining. In our system, genetic sequences
are represented by objects of an Object-Oriented
Database Management System (OODBMS). This is
the reason why we call it MINOS (MINing Object Sys-
tem).
The interaction with the user is accomplished by
the use of the query language of the underlying
database system. This query language permits to
build functions that predict properties of genetic se-
quences. It also permits to use these prediction func-
tions to detect properties in sequences. If the proper-
ties of the sequences are known from a direct biological
experiment, the application of prediction functions on
sequences is a way to criticize or validate these func-
tions. This is a first kind of knowledge revision.
When prediction functions are available, they can
be used to describe sequences through a concept lat-
tice. The interest of using concept lattice for sequence
analysis is twofold: Firstly, it helps the navigation in
a database of sequences by visualizing the sequences
and their associated properties. The lattice can be
graphically displayed, which provides hypertext-like
functionalities [lo]. Secondly, it is a tool for knowl-
edge revision because the nodes of the lattice group
examples sharing a common set of properties and these
relationships can be criticized.
Figure 1 shows how the knowledge acquisition pro-
cess works: An initial selection of data is analysed
by a learning algorithm. The result of this algorithm
(typically classes grouping initial data) is given to a
human analyst. This person compares the result pro-
duced via machine learning with previous result,s, and
with his own knowledge of the problem. Then he sug-
gests criticisms about the result that will help in the
choice of a new set of data to be examined.
Initial selection
Learning => Result
Figure 1: An interactive learning cycle.
91
lo6o-xxV95 $4.00 0 1995 IEEE
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95)
1060-3425/95 $10.00 © 1995 IEEE