Open Peer Review Any reports and responses or comments on the article can be found at the end of the article. RESEARCH ARTICLE Using diverse U.S. beef cattle genomes to identify missense mutations in a gene associated with high-altitude EPAS1, pulmonary hypertension [version 1; peer review: 2 approved] Michael P. Heaton , Timothy P.L. Smith , Jacky K. Carnahan , Veronica Basnayake , Jiansheng Qiu , Barry Simpson , Theodore S. Kalbfleisch 3 U.S. Meat Animal Research Center (USMARC), Clay Center, USA GeneSeek, a Neogen Company, Lincoln, USA Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, USA Abstract The availability of whole genome sequence (WGS) data has made it possible to discover protein variants . However, existing bovine in silico WGS databases do not show data in a form conducive to protein variant analysis, and tend to under represent the breadth of genetic diversity in U.S. beef cattle. Thus, our first aim was to use 96 beef sires, sharing minimal pedigree relationships, to create a searchable and publicly viewable set of mapped genomes relevant for 19 popular breeds of U.S. cattle. Our second aim was to identify protein variants encoded by the bovine endothelial PAS domain-containing protein 1 gene ( ), a gene EPAS1 associated with high-altitude pulmonary hypertension in Angus cattle. The identity and quality of genomic sequences were verified by comparing WGS genotypes to those derived from other methods. The average read depth, genotype scoring rate, and genotype accuracy exceeded 14, 99%, and 99%, respectively. The 96 genomes were used to discover four amino acid variants encoded by (E270Q, P362L, A671G, and L701F) and EPAS1 confirm two variants previously associated with disease (A606T and G610S). The six missense mutations were verified with EPAS1 matrix-assisted laser desorption/ionization time-of-flight mass spectrometry assays, and their frequencies were estimated in a separate collection of 1154 U.S. cattle representing 46 breeds. A rooted phylogenetic tree of eight polypeptide sequences provided a framework for evaluating the likely order of mutations and potential impact of alleles on the adaptive EPAS1 response to chronic hypoxia in U.S. cattle. This public, whole genome resource facilitates identification of protein variants in diverse types in silico of U.S. beef cattle, and provides a means of translating WGS data into a practical biological and evolutionary context for generating and testing hypotheses. Keywords Beef cattle , Whole genome sequence , EPAS1 , HIF2A , Pulmonary hypertension , Brisket disease 1 1 1 2 2 2 3 1 2 3 Reviewer Status Invited Reviewers version 2 (revision) 05 Oct 2016 version 1 16 Aug 2016 1 2 report report , Texas Tech University, Joseph M. Neary Lubbock, USA 1 , Irish Cattle Breeding Matthew C. McClure Federation, Bandon, Ireland 2 16 Aug 2016, :2003 ( First published: 5 ) https://doi.org/10.12688/f1000research.9254.1 05 Oct 2016, :2003 ( Latest published: 5 ) https://doi.org/10.12688/f1000research.9254.2 v1 Page 1 of 23 F1000Research 2016, 5:2003 Last updated: 14 APR 2020