The Breast Cancer Gene Database: a collaborative information resource Rudeina A Baasiri 1 , Stanley R Glasser 1 , David L Steen 2 and David A Wheeler* ,1 1 Department of Cell Biology, Baylor College of Medicine, Houston, Texas, TX 77030, USA; 2 Biomedical Computing, Inc., Houston, Texas, TX 77005, USA The Breast Cancer Gene Database (BCGD) is a compendium of molecular genetic data relating to genes involved in breast cancer, and which is freely available via the World Wide Web. The data in BCGD is extracted from the published biomedical research literature and stored as a collection of `Facts', which in turn are collected into topical categories organized by gene. This organization facilitates quick searches and rapid retrievals of speci®c data such as gene character- istics, functions and role in oncogenesis, and is an important factor allowing for continuous updates. BCGD can be searched either by gene name or keyword. Data is deposited and retrieved from the database through a set of interactive Web forms, making it both platform- independent and universally accessible in facilitating worldwide collaborative authoring of the database. Data in BCGD is linked to other on-line resources such as Entrez, GeneCards and On-Line Mendelian Inheritance in Man. BCGD is located at http://mbcr.bcm.tmc.edu/ ermb/bcgd/bcgd.html. Keywords: electronic publishing; breast cancer; data- base; information retrieval Introduction It is widely appreciated that the biomedical research literature accumulates at a rate far surpassing that at which anyone can read it, let alone assimilate it. It is shown here that, at the current rate, one would need to scan in excess of 130 journals and read in excess of 27 papers a day to keep up with the ®eld of Breast Cancer Genes, and the rate of accumulation of literature in this ®eld is still increasing. Researchers have developed a variety of strategies to cope with an excessively large scienti®c literature. Traditionally, this has included books and review articles which summarize a large volume of primary literature. More recently, digital resources have provided additional solutions to this problem. Com- pared to books and reviews, digital resources can be more up-to-date, developed cumulatively, easier to use, and more powerful than their traditional predecessors. A wide variety of such resources are available to the breast cancer gene research community, ranging from repositories of primary data (e.g. sequence databases such as GenBank (Benson et al., 1999), EMBL (Stoesser et al., 1999), DDBJ (Tateno and Gojobori, 1997), SwissProt (Bairoch and Apweiler, 1999), and PIR (Barker et al., 1999)) to resources for consumer health (e.g. OncoLink (Buhle et al., 1994)). Described here is the Breast Cancer Gene Database (BCGD), an additional resource with unique bene®ts for breast cancer researchers. BCGD contains a comprehensive list of genes involved in breast cancer, and for each of these genes, information on a speci®c set of topics. Links to the literature reference for that information are provided. BCGD's organization by topic aords quick and convenient access to speci®c pieces of information. Finally, the World Wide Web (Web)- based interface for adding information and its collaborative capabilities will facilitate BCGD main- tenance so that its value to the breast cancer gene community can remain high. Results Contents of BCGD Because the facts in BCGD are extracted from the published literature, it is important to begin by identifying breast cancer gene publications. The PubMed database (http://www.ncbi.nlm.nih.gov/En- trez/medline.html) was searched for citations to these publications. (The database used by PubMed is a superset of the MEDLINE database; http:// www.ncbi.nlm.nih.gov/PubMed/overview.html.). A somewhat complicated search strategy, described in Materials and methods, was found to be necessary for the comprehensive retrieval of these citations. When this strategy was applied, citations to 90 373 publica- tions were identi®ed. When the citations for each of the last 10 years was independently identi®ed (Figure 1), it was found that the rate of accumulation of breast cancer gene publications is continuing to increase, and that in 1998, the rate of accumulation is equal to 27.4 publications a day. As described in Materials and methods, making these searches reasonably complete required searching by gene, using all the names by which that gene is known. This approach has the added bene®t of providing a natural way to divide references among curators and to make the task of extracting data from these publications more manageable. The number of references retrieved from PubMed for each of the genes in BCGD is given in Table 1. BCGD is a `gene database' in that only the information pertaining to breast cancer and attribu- table to the characteristics or action of a particular gene is included in the database. Although the primary focus is on proto-oncogenes and tumor suppressor genes (genes implicated in the induction, maintenance or progression of breast cancer), genes that are of potential diagnostic or prognostic value or are of *Correspondence: DA Wheeler Received 26 July 1999; revised 19 October 1999; accepted 26 October 1999 Oncogene (1999) 18, 7958 ± 7965 ã 1999 Stockton Press All rights reserved 0950 ± 9232/99 $15.00 http://www.stockton-press.co.uk/onc