Peer Reviewer List available upon request March 2019 Y-DNA Phylogeny Reconstruction using likelihood-weighted phenetic and cladistic data – the SAPP Program J. David Vance Affiliations: ISOGG. Contact: davevance01@gmail.com Abstract Modern genetic genealogy conventional approaches to reconstructing the phylogeny of agnatic (male-line) ancestors for a group of Y-DNA-tested men have traditionally used either Y-STR or Y-SNP data only. This creates an occasional dilemma over which analysis - Y-STR or Y-SNP - more accurately reflects the phylogenetic tree of the group; an unnecessary dilemma since both sets of data are products of the same historical agnatic lines of descent and should therefore be complementary. Y-STRs and Y-SNPs also each have different strengths which can be used in concert to partially offset their separate weaknesses. An approach is presented that weighs phenetic and cladistic data characteristics from the available sources of data (Y-STR and Y-SNP) as well as from traditional genealogy information according to likelihood to reconstruct an agnatic phylogenetic tree which reaches 100% accuracy at maximum data availability while exploiting the strengths of each available data source. This approach has also been made publicly available as the free online software program Still Another Phylogeny Program (SAPP at http://www.jdvtools.com/SAPP). 1. Report 1.1. Introduction The major value of commercial Y-DNA testing to the field of genealogy lies in the opportunity for the consumer, through aggregate data collected from one or more Y-DNA tests (collectively here called their “kit”, although it may include test results from several companies), to match other tested men and gain more insight into their shared agnatic (male-line) ancestry. Discovering matches is therefore a key objective in the pursuit of genetic genealogy, and as affordable testing has improved and databases of matches have grown larger this objective has moved from a focus on “Who do I match?” to “How are we related?”. This second question has driven many approaches to reconstructing the phylogenetic tree of agnatic ancestors for a group of kits representing tested men, although until now most approaches have used only one type of available data from the Y chromosome – usually either Short Tandem Repeats (Y-STRs) or Single Nucleotide Polymorphisms (Y-SNPs), often paired with knowledge from traditional genealogy research. Conventionally, kits have been grouped into predicted or confirmed haplogroups by differing manual approaches and then further sorted within those haplogroups by genetic distance based on Y-STR marker allele differences. That sorting is often then further improved through more sophisticated analyses like Y-STR signature (motif) matching. At its most sophisticated, manual Y-STR mutation history trees can be created to map at least partial agnatic phylogenies for a group of men. In parallel, Y-SNP haplotrees have also become a common structure for representing a group’s phylogenetic tree especially as Y-SNP testing has gained in affordability and popularity. At the current state of Y-DNA testing, any smaller haplogroup is typically formed of many kits at varying levels of Y-STR testing (often at Y12, Y37, Y67, or Y111 levels, though in some cases up to 561 Y-STRs), and Y-SNP testing - which even at its most extensive has typically uncovered a branching Y-SNP no more frequently than every 3-4 generations. In such cases one set of data may help determine branching in one subset of the phylogeny while another set of data carries more information about a different subset. This also creates an occasional dilemma about whether Y-STR or Y-SNP analysis more closely reflects the actual phylogenetic tree of the group; a dilemma which is unnecessary since the approaches are complementary and can be combined along with further insights from the group’s traditional genealogy research to recreate as much of the full likely