VARUS: Sampling Complementary RNA Reads from the Sequence Read Archive Supplementary Materials Mario Stanke 1,2 , Willy Bruhn 1 , Felix Becker 1,2 , and Katharina J. Hoff 1,2 1 Institute for Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Str. 47, 17489, Greifswald, Germany 2 Center for Functional Genomics of Microbes, University of Greifswald, Felix-Hausdorff-Str. 8, 17489, Greifswald, Germany April 13, 2019 Contents 1 Supplementary Methods 1 1.1 Assembly & Reference Annotation Processing ............................. 1 1.2 Running VARUS ............................................. 2 1.3 Downloading Manually Selected RNA-Seq Libraries .......................... 3 1.4 Aligning Manually Selected RNA-Seq Libraries ............................ 3 1.5 Running BRAKER ............................................ 3 2 Supplementary Figures 3 3 Supplementary Tables 5 1 Supplementary Methods 1.1 Assembly & Reference Annotation Processing The names of genome FASTA file entries downloaded from NCBI are long and complex. For our purposes, unique sequence IDs in the header are sufficient. We trimmed FASTA headers and replaced dots in sequence names by underscores: cat original.fa | perl -pe ’s/(>\S*)\.(\d+)\s.*/$1_$2/g;’ > genome.fa Dots in sequence names were replaced by underscores in the corresponding annotation files: cat original.gff | perl -pe ’s/\./_/’ > annot.gff3 BRAKER is by design unable to predict genes with frameshift errors or genes that have parts located on both strands. Reference annotation files were checked for genes with such issues using GenomeTools [Gremme et al., 2013]: gt gff3 -force -tidy -o annot_tidy_by_GenomeTools.gff3 \ -retainids -sort annot.gff3 2> errors_by_GenomeTools cat errors_by_GenomeTools | grep -v gbunit | cut -f6 -d’ ’ | sort | uniq -c | wc -l Reference annotation files in GFF3 format were converted to GTF format using GenomeTools (this also removed numerous file entries that are not related to the structures of protein coding genes, e.g. rRNA features, etc.): gt gff3_to_gtf -force -o annot_by_GenomeTools.gtf annot_tidy_by_GenomeTools.gff3 1