Development and application of a T-RFLP data analysis method using correlation
coefficient matrices
Yoshio Nakano ⁎, Toru Takeshita, Noriaki Kamio, Susumu Shiota, Yukie Shibata,
Masaki Yasui, Yoshihisa Yamashita
Department of Preventive Dentistry, Faculty of Dental Science, Kyushu University, Japan
abstract article info
Article history:
Received 27 June 2008
Received in revised form 4 August 2008
Accepted 4 August 2008
Available online 24 September 2008
Keywords:
Microflora
Bacterial diversity
16S rRNA
Environmental microbiology studies commonly use terminal restriction fragment length polymorphism (T-
RFLP) of 16S rRNA genes, for example, to analyze changes in community structure in relation to changing
physicochemical and biological conditions over space and time. Although T-RFLP is most useful for
comparing samples from different environments, a large number of samples makes effective analysis difficult
using the Web-based tools that are currently available. To resolve this dilemma, we used a new approach for
calculating data from multiple T-RFLP samples by estimating terminal fragment combinations, then applying
a correlation analysis using two different fluorescent dyes generated simultaneously from all samples. This
calculation was based on the expectation that the proportions of two terminal fragments from one full-
length polymerase chain reaction fragment would be nearly the same in each analysis. Using this program,
the oral microflora in 73 human saliva samples were analyzed, and 24 bacterial groups, with peak areas of at
least 0.5% and correlation coefficients of 0.55 or greater, were identified from the T-RFs within 40 s.
© 2008 Elsevier B.V. All rights reserved.
1. Introduction
Terminal restriction fragment length polymorphisms (T-RFLPs)
targeted at the 16S rRNA gene provide an effective tool for analyzing
bacterial communities, including unculturable species. Community
analysis using T-RFLP with fluorescently labeled primers offers a
compromise between high sample throughput and phylogenetic
resolution (DeLong and Pace, 2001; Liu et al., 1997; Marsh, 2005). The
gene of interest is amplified from bacterial chromosomal DNA using PCR
techniques with one or two fluorescently labeled primers, and the
amplicon mixture is then digested by one or more restriction enzymes to
generate fragments of different sizes. The labeled DNA fragments are
then separated using capillary electrophoresis and detected by a laser
reader, which generates a profile based on fragment lengths. Users can
predict the bacterial species by comparing the observed fragment
lengths with the lengths calculated from known DNA sequences.
A major problem in T-RFLP analysis is the deviation in peak retention
times (in capillary electrophoresis) from the values that are calculated
based on the lengths of the nucleotide sequences, although individual
retention times are highly reproducible. These deviations make it
difficult to identify the origin of each fragment (Marsh, 2005; Liu et al.,
1997; Sakamoto et al., 2003). One approach to solving this problem is to
minimize the deviation, and we have reported major improvements in
the accuracy of peak identification (Takeshita et al., 2007).
In the present study, another approach was taken to identify each
fragment with its origin in a calculated database of 16S rRNA
sequences by using a correlation matrix of retention times and peak
areas of the fragments. This strategy was based on the expectation
that the proportions of two terminal fragments from one full-length
PCR fragment would be nearly equal in each analysis. We hypothe-
sized that the prediction of pairs of terminal restriction fragments (T-
RFs) containing both fluorescence-labeled primers, using a matrix of
correlation coefficients instead of by narrowing down the intersection
or subset of a combination of 16S rRNA fragments generated by the
digestion of several restriction enzymes, could overcome the difficulty
estimating the origin of each terminal fragment. Moreover, the
proportional composition of bacterial species in all samples can be
estimated. Consequently, the combination of T-RFs could be estimated
to predict the bacterial species by comparing the observed fragment
lengths and the calculated lengths of known DNA sequences.
T-RFLP analysis is typically used to compare bacterial communities in
multiple samples from various environments, to determine changes in
bacterial diversity over space and time. However, when the number of
samples to be analyzed is beyond the capacity of the laboratory, such
inefficiency prevents high-throughput T-RFLP analysis. The goal of this
study was to develop a high-throughput T-RFLP tool based on a new
approach for the simultaneous processing of multiple samples. We
describe a new method of using T-RFLP analysis as a high-throughput
technique for the phylogenetic analysis of multiple samples.
Journal of Microbiological Methods 75 (2008) 501–505
⁎ Corresponding author. Department of Preventive Dentistry, Faculty of Dental
Science, 3-1-1 Maidashi, Higashi-ku, Fukuoka-shi, Fukuoka 812-8582, Japan. Fax: +81 92
642 6354.
E-mail address: yosh@dent.kyushu-u.ac.jp (Y. Nakano).
0167-7012/$ – see front matter © 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.mimet.2008.08.002
Contents lists available at ScienceDirect
Journal of Microbiological Methods
journal homepage: www.elsevier.com/locate/jmicmeth