DATABASES OFFICIAL JOURNAL www.hgvs.org Mutation Databases for Inherited Renal Disease: Are They Complete, Accurate, Clinically Relevant, and Freely Available? Judy Savige, 1 ∗ Hayat Dagher, 1 and Sue Povey 2 1 Department of Medicine, The University of Melbourne (Northern Health, Melbourne Health), Melbourne, Australia; 2 Research Department of Genetics, Evolution and the Environment, University College London, London, UK Communicated by Raymond Dalgleish. Received 20 June 2013; accepted revised manuscript 9 April 2014. Published online 14 May 2014 in Wiley Online Library (www.wiley.com/humanmutation). DOI: 10.1002/humu.22588 ABSTRACT: This study examined whether gene-specific DNA variant databases for inherited diseases of the kid- ney fulfilled the Human Variome Project recommenda- tions of being complete, accurate, clinically relevant and freely available. A recent review identified 60 inherited renal diseases caused by mutations in 132 genes. The dis- ease name, MIM number, gene name, together with “mu- tation” or “database,” were used to identify web-based databases. Fifty-nine diseases (98%) due to mutations in 128 genes had a variant database. Altogether there were 349 databases (a median of 3 per gene, range 0–6), but no gene had two databases with the same number of vari- ants, and 165 (50%) databases included fewer than 10 variants. About half the databases (180, 54%) had been updated in the previous year. Few (77, 23%) were curated by “experts” but these included nine of the 11 with the most variants. Even fewer databases (41, 12%) included clinical features apart from the name of the associated dis- ease. Most (223, 67%) could be accessed without charge, including those for 50 genes (40%) with the maximum number of variants. Future efforts should focus on en- couraging experts to collaborate on a single database for each gene affected in inherited renal disease, including both unpublished variants, and clinical phenotypes. Hum Mutat 35:791–793, 2014. C 2014 Wiley Periodicals, Inc. KEY WORDS: mutation databases; DNA variants; inher- ited kidney disease; Alport syndrome; polycystic kidney disease; tuberous sclerosis Introduction The genes affected in many forms of inherited renal disease are known [Hildebrandt, 2010], and diagnostic testing is available for these using Sanger and exomic sequencing. The commonest inher- ited renal diseases for which testing is requested are polycystic kidney disease, Alport syndrome, tuberous sclerosis, focal and segmental Additional Supporting Information may be found in the online version of this article. ∗ Correspondence to: Judy Savige, Department of Medicine, Royal Melbourne Hospital, The University of Melbourne, Parkville VIC 3050, Australia. E-mail: jasavige@unimelb.edu.au glomerulosclerosis, medullary cystic kidney disease, and some tubu- lar disorders. Typically, the same clinical phenotypes show genetic heterogeneity, and each of the affected genes has many pathogenic changes that are different in individual families. However, each gene also has many normal variants and any diagnostic service must dis- tinguish these from disease-causing changes. This is straightforward if pathogenicity has been reported convincingly in a previous pub- lication or in a database using well-defined criteria. For variants of unknown significance, most laboratories use a combination of two approaches to confirm disease association: a search of normal single nucleotide polymorphism databases, which include the results from the Hapmap and “1000 genomes” projects, and an in silico analysis that depends on whether the change affects a highly conserved, and therefore structurally and functionally important, residue, as well as the likely effects of this change. Sometimes the family will be examined to confirm that a variant segregates with disease. However, searching gene variant or locus-specific databases has limitations. Many variants identified in routine diagnostic labora- tories are not reported [Cotton, 2000], because publication requires time and effort, and journals are reluctant to accept multiple small series of variants that, on their own, do not increase our understand- ing of a disease. Sometimes variant descriptions use a nonstandard protein reference sequence, which results in an inconsistent, and hence, confusing numbering system. Not uncommonly, assessments of pathogenicity are incorrect [Murphy et al., 2004; Gout et al., 2007], and errors are rarely retracted or corrected except where the same variant is reported a second time. Some of these errors prob- ably result from having database curators that are not familiar with the disease or the corresponding affected gene. Databases that are incomplete or inaccurate waste time, effort and resources in dupli- cating assessments that have already been performed elsewhere. The Human Variome Project (HVP) (www.humanvariome project.org) envisages DNA variant databases that are current and complete, accurate, clinically relevant, and freely available. Its mem- bers have developed a database template that can be uploaded onto an institution’s Website or a site hosted by the University of Leiden [Fokkema et al., 2005, 2011], as well as guidelines on standardiz- ing variant descriptions, strategies for maximizing the number of variants in the public domain, and solutions to all kinds of ethical issues [Povey et al., 2010; Celli et al., 2012; Vihinen et al., 2012]. The HVP advocates the use of expert curators with complementary skills in describing a disease (clinicians) or a gene (scientists and researchers) since they are more likely to accurately record descrip- tions of variants and phenotypes [Samuels and Rouleau, 2011], and are more able to encourage colleagues in other laboratories to share their own unpublished variants. C 2014 WILEY PERIODICALS, INC.