Submitted 13 March 2020 Accepted 3 September 2020 Published 21 October 2020 Corresponding author Knut Rudi, knut.rudi@nmbu.no Academic editor Joseph Gillespie Additional Information and Declarations can be found on page 9 DOI 10.7717/peerj.10029 Copyright 2020 Angell et al. Distributed under Creative Commons CC-BY 4.0 OPEN ACCESS De novo species identification using 16S rRNA gene nanopore sequencing Inga Leena Angell 1 , Morten Nilsen 1 , Karin C. Lødrup Carlsen 2 ,3 , Kai-Håkon Carlsen 2 ,3 , Gunilla Hedlin 4 ,5 , Christine M. Jonassen 1 ,6 , Benjamin Marsland 7 , Björn Nordlund 4 ,5 , Eva Maria Rehbinder 3 ,8 , Carina Saunders 2 ,3 , Håvard Ove Skjerven 2 ,3 , Anne Cathrine Staff 3 ,9 , Cilla Söderhäll 4 ,5 , Riyas Vettukattil 2 ,3 and Knut Rudi 1 1 Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway 2 Division of Paediatric and Adolescent Medicine, Oslo University Hospital, Oslo, Norway 3 Faculty of Medicine, Institute of Clinical Medicine, University of Oslo, Oslo, Norway 4 Astrid Lindgren Children’s Hospital, Karolinska University Hospital, Stockholm, Sweden 5 Department of Women’s and Children’s Health, Karolinska Institutet, Stockholm, Sweden 6 Genetic Unit, Centre for Laboratory Medicine, Østfold Hospital Trust, Kalnes, Norway 7 Department of Immunology and Pathology, Central Clinical School, Monash University, Melbourne, Victoria, Australia 8 Department of Dermatology, Oslo University Hospital, Oslo, Norway 9 Division of Obstetrics and Gynaecology, Oslo University Hospital, Oslo, Norway ABSTRACT Nanopore sequencing is rapidly becoming more popular for use in various microbiota- based applications. Major limitations of current approaches are that they do not enable de novo species identification and that they cannot be used to verify species assignments. This severely limits applicability of the nanopore sequencing technology in taxonomic applications. Here, we demonstrate the possibility of de novo species identification and verification using hexamer frequencies in combination with k- means clustering for nanopore sequencing data. The approach was tested on the human infant gut microbiota of 3-month-old infants. Using the hexamer k-means approach we identified two new low abundant species associated with vaginal delivery. In addition, we confirmed both the vaginal delivery association for two previously identified species and the overall high levels of bifidobacteria. Taxonomic assignments were further verified by mock community analyses. Therefore, we believe our de novo species identification approach will have widespread application in analyzing microbial communities in the future. Subjects Bioinformatics, Ecology, Microbiology, Molecular Biology Keywords Nanopore, 16S rrNA, Infant gut, Microbiota INTRODUCTION Third generation nanopore sequencing has revolutionized the field of analyzing microbial communities, with the promise of on-site high throughput analyses (Acharya et al., 2019). However, despite several recent advances in nanopore sequencing, the error rates are too high for de novo species identification (Shin et al., 2016). Therefore, all current approaches are based on some kind of reference, or black-box systems for species identification (Winand How to cite this article Angell IL, Nilsen M, Carlsen KCL, Carlsen K-H, Hedlin G, Jonassen CM, Marsland B, Nordlund B, Rehbinder EM, Saunders C, Skjerven HO, Staff AC, Söderhäll C, Vettukattil R, Rudi K. 2020. De novo species identification using 16S rRNA gene nanopore sequencing. PeerJ 8:e10029 http://doi.org/10.7717/peerj.10029