International Journal of Mass Spectrometry 258 (2006) 58–73
SpectraMiner, an interactive data mining and visualization software for
single particle mass spectroscopy: A laboratory test case
Alla Zelenyuk
a,∗
, Dan Imre
b
, Yong Cai
a
, Klaus Mueller
c
, Yiping Han
c
, Peter Imrich
c
a
Pacific Northwest National Laboratory, Richland, WA 99354, USA
b
Imre Consulting, Richland, WA 99352, USA
c
State University of New York at Stony Brook, Stony Brook, NY 11794, USA
Received 20 April 2006; received in revised form 19 June 2006; accepted 21 June 2006
Available online 26 July 2006
Abstract
Single particle mass spectrometers are sophisticated instruments designed to measure the sizes and compositions of a wide range of individual
particles in situ, in real-time. They characterize hundreds of thousands or millions of particles, generating vast amounts of rich and complex data,
the proper mining of which requires dedicated state of the art tools. The analysis of individual particle mass spectra is particularly difficult because
of their high dimensionality—each data point, representing a single particle, includes the 450 mass spectral peak intensities, particle size, and time
of detection. The first step is to organize the data; a process typically accomplished by grouping particles of similar attributes. Since the common
assumption is that the data should be reduced to become manageable, they are typically classified into a small number of clusters (∼10), each of
which is represented by an average/representative spectrum. Our approach is quite different. We have developed a data mining and visualization
software package we call SpectraMiner that makes it possible to handle hundreds of clusters, limiting loss of information and thus overcoming
the boundaries set by traditional statistical data analysis approaches. Data, which often include over 1 million particle spectra, are organized using
K-mean clustering algorithm. The clusters are merged into nodes by sequentially combining similar clusters. The final structure is displayed in a
hierarchical dynamical tree or circular dendogram. This interactive dendogram is the visual interface that allows for real-time data exploration and
mining. Clicking on any of the clusters/nodes in the dendogram reveals the detailed information about the particles that reside at that position. At
each step the scientist is in control of the level of detail and the visualization format, rapidly switching between them while running the program
on a PC.
Here we present a study that puts the classification aspect of SpectraMiner to the test. Twelve types of laboratory generated particles are carefully
chosen to test some of the difficult aspects of single particle mass spectroscopy. We quantify the degree of particle identification and separation at
a number of levels and demonstrate how the visualization tools that SpectraMiner provides can be used to refine, steer and control the data mining
process.
© 2006 Elsevier B.V. All rights reserved.
Keywords: Single particle mass spectrometer; Data classification; Data visualization
1. Introduction
Single particle mass spectrometers (SPMSs) are presently
widely used to provide real-time, in situ information on the
sizes and compositions of individual aerosol particles. The path
from instrument design and construction to data acquisition and
analysis is long and demanding. The goal is to use SPMSs to
generate high quality, reproducible, easy to assign individual
∗
Corresponding author. Tel.: +1 5093767696.
E-mail address: alla.zelenyuk@pnl.gov (A. Zelenyuk).
particle mass spectra (IPMS). In reality IPMS that are gen-
erated by laser ablation tend to exhibit very large particle-to-
particle variations, making the data mining process a daunting
task. The steady drive to improve the instrumental aspects of
SPMSs represents great challenges and remains at the center
of a significant research and development effort in the field. It
is important to realize that the immensity and complexity of
the rich data that are produced by these sophisticated instru-
ments requires comparable, dedicated state of the art analytical
tools that afford the user the opportunity to extract as much
knowledge as the data can offer. The focus of this paper is on
the approach we have developed to analyze the vast amounts
1387-3806/$ – see front matter © 2006 Elsevier B.V. All rights reserved.
doi:10.1016/j.ijms.2006.06.015