Analysis of Multitag Pyrosequence Data from Human Cervical Lavage Samples by Ammar Naqvi a ) b ), Huzefa Rangwala a ) b ) c ), Greg Spear d ), and Patrick Gillevet* a ) b ) e ) a ) Microbiome Analysis Center, Manassas, VA 20110, USA (phone: 703-993-1057; e-mail: pgilleve@gmu.edu) b ) Bioinformatics and Computational Biology Department, Manassas, VA 20110, USA c )Department of Computer Science, George Mason University, Fairfax, VA 22030, USA d ) Departments of Immunology, Microbiology, and Medicine, Rush University Medical Center, Chicago, IL 60612, USA e ) Department of Environmental Science and Policy, George Mason University, Fairfax, VA 22030, USA We have been using the Roche GS-FLX sequencing platform to produce tens of thousands of sequencing reads from samples of both bacterial communities (microbiome) and fungal communities (mycobiome) of stool, gut mucosa, vaginal washes, and oral washes from a large number of subjects. This vast volume of data from diverse sources has necessitated the development of an analysis pipeline in order to systematically and rapidly identify the taxa within the samples and to correlate the sample data with clinical and environmental features. Specifically, we have developed automated analytical tools for data tracking, taxonomical analysis, and feature clustering of bacteria in the human microbiome and demonstrate the pipeline using Cervical Vaginal Lavage (CVL) samples. This analysis pipeline will not only provide insight to our specific CVL dataset, but is applicable to other microbiome samples and will ultimately broaden our understanding of how the microbiome influences human health. Introduction. – The human microbiome is the microbial community in a particular ecological niche on or in the human body. The human body contains one of the most densely populated microbial ecosystem known on earth. Over 1 · 10 14 microbial cells interact with the human host [1] [2], indicating the ubiquity and potentially critical importance of such interactions in our bodies. Functional interactions between the microbes in the protective mucosal gut biofilm, and gut epithelial and immune cells are critical to human health. We define these interactions of a human microbiome with the host metabolism and immune systems as the Metabiome. These interactions are involved in the immune system, and its responses, metabolic regulation, quorum sensing, and digestion. In diseased states, the normal microbiome composition can shift (dysbiosis) altering these interactions and the functionality of the Metabiome. We report the development and validation of an analytical pipeline that calculates the taxonomic distribution across the samples which is practical and addresses major obstacles encountered in the analysis of microbiome samples such as data set size and taxanomic identification. It results in a quick and efficient method to characterize samples based on taxonomy and abundance. We evaluated our tool on a data set related to the lower genital tract microbiome sampled by Cervical Vaginal Lavage (CVL). Our subjects were healthy females and females with HIV with and without Bacterial Vaginosis (BV). A healthy vaginal CHEMISTRY & BIODIVERSITY – Vol. 7 (2010) 1076 2010 Verlag Helvetica Chimica Acta AG, Zürich