416 Computational gene discovery and human disease Christopher J Rawlings* and David B Seat-M Bioinformatics is now an essential tool in many aspects of human molecular genetics research. Methods for the prediction of gene structure are essential components in genomic sequencing projects and provide the key to deriving protein sequence and locating intron/exon junctions. Sequence comparison and database searching are the pre-eminent approaches for predicting the likely biochemical function of new genes, although sequence profiles derived from families of aligned sequences have advantages in the detection of remote sequence relationships. The use of sequence database analysis for large-scale comparative analysis of genome sequence data from model organisms is emerging as the most important recent development in the application of bioinformatics methods for characterizing candidate disease genes. Addresses *SmithKline Beecham Pharmaceuticals, Department of Bioinformatics, New Frontiers Science Park, Third Avenue, Harlow, Essex CM1 9 5AW, UK; e-mail: Chris-Rawlings-1 @sbphrd.com +SmithKline Beecham Pharmaceuticals, Department of Bioinformatics, 709 Swedeland Road, PO Box 1539, King of Prussia, PA 19406-0939, USA; e-mail: David_B_Searls@sbphrd.com Current Opinion in Genetics & Development 1997, 7:416-423 http://biomednet.com/elecref/0959437X00700416 0 Current Biology Ltd ISSN 0959-437X Abbreviations AT ATM BLAST EGAD EST FISH GRAIL HMM HNPCC bk IRE NBCC PKD Ran RCCl ataxia telangiectasia mutated in a&a telangiectasia basic local alignment search tool expressed gene anatomy database expressed sequence tag fluorescent in sifu hybridization gene recognition and analysis internet link hidden Markov model hereditary nonpolyposis colorectal cancer hypoxanthine-guanine phosphoribosyltransferase iron responsive element nevoid basal cell carcinoma syndrome polycystic kidney disease Ras-related nuclear protein regulation of chromosome condensation protein World Wide Web XP xeroderma pigmentosum Introduction It is now well established that computational molecular biology (bioinformatics) can make a significant contri- bution to the search for genes implicated in human diseases. The increasing impact of bioinformatics in human disease genetics is illustrated by an analysis of the published scientific literature (Fig. 1). The number of papers in Medline (-400000 per annum) indexed as containing results of sequence analysis has increased steadily to the present level of 2.8%. Of these, an increasing proportion (now -1.1% or -0.03% overall) are indexed as specifically contributing to research into human disease. These figures, however, are almost certainly a significant underestimate of the real importance of bioinformatics because most laboratories closely integrate computational methods with experimental approaches to gene identification. To compile this review, we have selected publications which best illustrate the way in which different bioinformatics methods can be used to link previously uncharacterized nucleotide sequences to human disease phenotypes or to elucidate the molecular basis of human disease. Fiaure 1 r2.0 l.51100 z 1992 1993 1994 1995 1990 ’ Year l Sequence analysis q Sequence analysis The proportion of papers indexed each year in Medline as containing sequence analysis results. The relative proportion of sequence analysis papers indexed as relating to disease or disease progression is shown to illustrate the increasing contribution of bioinformatics methods to the understanding of the molecular basis of human disease. Medline indexes -400 000 papers per year. This article begins with an overview of gene prediction methods and then reviews developments in sequence database searching methods and their function in compara- tive genomics and whole-genome analysis. The remainder of this review addresses methods for identifying increas- ingly remote sequence similarities and ends by providing some practical pointers to disease-specific databases and to further reading in bioinformatics research.