Development and application of a T-RFLP data analysis method using correlation coefcient matrices Yoshio Nakano , Toru Takeshita, Noriaki Kamio, Susumu Shiota, Yukie Shibata, Masaki Yasui, Yoshihisa Yamashita Department of Preventive Dentistry, Faculty of Dental Science, Kyushu University, Japan abstract article info Article history: Received 27 June 2008 Received in revised form 4 August 2008 Accepted 4 August 2008 Available online 24 September 2008 Keywords: Microora Bacterial diversity 16S rRNA Environmental microbiology studies commonly use terminal restriction fragment length polymorphism (T- RFLP) of 16S rRNA genes, for example, to analyze changes in community structure in relation to changing physicochemical and biological conditions over space and time. Although T-RFLP is most useful for comparing samples from different environments, a large number of samples makes effective analysis difcult using the Web-based tools that are currently available. To resolve this dilemma, we used a new approach for calculating data from multiple T-RFLP samples by estimating terminal fragment combinations, then applying a correlation analysis using two different uorescent dyes generated simultaneously from all samples. This calculation was based on the expectation that the proportions of two terminal fragments from one full- length polymerase chain reaction fragment would be nearly the same in each analysis. Using this program, the oral microora in 73 human saliva samples were analyzed, and 24 bacterial groups, with peak areas of at least 0.5% and correlation coefcients of 0.55 or greater, were identied from the T-RFs within 40 s. © 2008 Elsevier B.V. All rights reserved. 1. Introduction Terminal restriction fragment length polymorphisms (T-RFLPs) targeted at the 16S rRNA gene provide an effective tool for analyzing bacterial communities, including unculturable species. Community analysis using T-RFLP with uorescently labeled primers offers a compromise between high sample throughput and phylogenetic resolution (DeLong and Pace, 2001; Liu et al., 1997; Marsh, 2005). The gene of interest is amplied from bacterial chromosomal DNA using PCR techniques with one or two uorescently labeled primers, and the amplicon mixture is then digested by one or more restriction enzymes to generate fragments of different sizes. The labeled DNA fragments are then separated using capillary electrophoresis and detected by a laser reader, which generates a prole based on fragment lengths. Users can predict the bacterial species by comparing the observed fragment lengths with the lengths calculated from known DNA sequences. A major problem in T-RFLP analysis is the deviation in peak retention times (in capillary electrophoresis) from the values that are calculated based on the lengths of the nucleotide sequences, although individual retention times are highly reproducible. These deviations make it difcult to identify the origin of each fragment (Marsh, 2005; Liu et al., 1997; Sakamoto et al., 2003). One approach to solving this problem is to minimize the deviation, and we have reported major improvements in the accuracy of peak identication (Takeshita et al., 2007). In the present study, another approach was taken to identify each fragment with its origin in a calculated database of 16S rRNA sequences by using a correlation matrix of retention times and peak areas of the fragments. This strategy was based on the expectation that the proportions of two terminal fragments from one full-length PCR fragment would be nearly equal in each analysis. We hypothe- sized that the prediction of pairs of terminal restriction fragments (T- RFs) containing both uorescence-labeled primers, using a matrix of correlation coefcients instead of by narrowing down the intersection or subset of a combination of 16S rRNA fragments generated by the digestion of several restriction enzymes, could overcome the difculty estimating the origin of each terminal fragment. Moreover, the proportional composition of bacterial species in all samples can be estimated. Consequently, the combination of T-RFs could be estimated to predict the bacterial species by comparing the observed fragment lengths and the calculated lengths of known DNA sequences. T-RFLP analysis is typically used to compare bacterial communities in multiple samples from various environments, to determine changes in bacterial diversity over space and time. However, when the number of samples to be analyzed is beyond the capacity of the laboratory, such inefciency prevents high-throughput T-RFLP analysis. The goal of this study was to develop a high-throughput T-RFLP tool based on a new approach for the simultaneous processing of multiple samples. We describe a new method of using T-RFLP analysis as a high-throughput technique for the phylogenetic analysis of multiple samples. Journal of Microbiological Methods 75 (2008) 501505 Corresponding author. Department of Preventive Dentistry, Faculty of Dental Science, 3-1-1 Maidashi, Higashi-ku, Fukuoka-shi, Fukuoka 812-8582, Japan. Fax: +81 92 642 6354. E-mail address: yosh@dent.kyushu-u.ac.jp (Y. Nakano). 0167-7012/$ see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.mimet.2008.08.002 Contents lists available at ScienceDirect Journal of Microbiological Methods journal homepage: www.elsevier.com/locate/jmicmeth