Statistical prediction of pathogenic variant sites in human mitochondrial genomes. Marcella Attimonelli (1) , Matteo Accetturo (1) , and Daniela Lascaro (1) (1) Dipartimento di Biochimica e Biologia Molecolare, Università degli Studi di Bari, Via Orabona, 4 – 70126 Bari, Italy. m.attimonelli@biologia.uniba.it Keywords. Mitochondrial disorders, site variability, pathological potential. Introduction Mitochondrial DNA disorders – disorders associated with dysfunctions of the oxidative phosphorylation system (OXPHOS) – are caused by inborn metabolism errors and have an estimated frequency of 1 out of 10000 live births. Due to the relevant role played by the OXPHOS system in ATP production, causes and effects of mitochondrial disorders are highly heterogeneous and complex [1]. Major origin of mitochondrial disorders is in both nuclear and mitochondrial DNA mutations. Although prenatal diagnosis is routine for nuclear DNA mutations, the cases of prenatal diagnosis of mtDNA mutations are rare, even though urgent, as no real therapies exist [2]. However thanks to bioinformatics support, the gap may be reduced in a short time. Indeed, up to now, the pathogenicity of mtDNA mutations has been, in most cases, prevalently validated by their segregation with the disease and their consequent loss of function when the mutation involves a structural gene, but no systematic statistical analysis of the mtDNA SNPs has been performed. Moreover the criteria commonly followed to associate a mutation to a given pathology are: - aminoacidic change in a strictly conserved site; - presence in patients only; - heteroplasmy condition; - presence in phenotipically similar, but ethnically different families. However a strict correlation mutation-phenotype in patients is not always verified. Here we propose a statistical approach aimed to contribute in the estimation of the pathogenic variation sites. The analysis is based on the estimation of site-specific relative variability in a sets of homologous sequences, through the application of SiteVarProt [3] and SiteVariability [4] softwares, in order to infer a correlation between site variability and pathogenicity of a given mutation. Methods Site-specific variability indexes have been calculated starting from nucleotidic and amminoacidic multialignments, through the application of SiteVariability and SiteVarProt softwares respectively. Site-specific relative variability values for each i th site (ν i ) of a dataset of N sequences, have been estimated according to the following formula: ν i = where, as far as nucleotidic sequences are concerned, δ is a parameter assuming value 1 when the variation is present and 0 elsewhere in the position i of the j pair sequences and K j the mean genetic distance calculated for the j pair on the entire alignment with the GTR model [5] [6], while as far as aminoacidic sequences are concerned, δ is a Blosum-like index (giving the level of similarity between two amminoacids) for the position i of the j pair sequences, and K j the mean genetic distance calculated with the Kimura model. In both cases values of ν i are normalized respect to the maximum value of variability (ν max ) calculated for that particular dataset of sequences, obtaining a new value γ i, in order to make site variability indexes comparable between two or more dataset of sequences. γ i = ν i / νmax Sample Two datasets have been used. The nucleotidic site variability estimate has been performed on mtDNA sequences of the 13 mitochondrial genes coding for OXPHOS proteins, belonging to 687 human subjects from different geographic origin. Most of the sequences have been retrieved from literature except for 7 belonging to West New Guinea individuals, which have been sequenced in our laboratory as part of a broader complete mitochondrial genome sequencing project. Whereas the aminoacidic site variability estimate has been calculated on the same set of sequences as far as human genomes are concerned, plus mtDNA sequences of the 13 mitochondrial genes coding for OXPHOS proteins belonging to 60 mammalian different species retrieved from AMmtDB database [7].