A Minimum Description Length Technique
for Semi-Supervised Time Series Classification
Nurjahan Begum, Bing Hu, Thanawin Rakthanmanon and Eamonn Keogh
Abstract In recent years the plunging costs of sensors/storage have made it possible
to obtain vast amounts of medical telemetry, both in clinical settings and more
recently, even in patient’s own homes. However for this data to be useful, it must be
annotated. This annotation, requiring the attention of medical experts is very expen-
sive and time consuming, and remains the critical bottleneck in medical analysis.
The technique of Semi-supervised learning is the obvious way to reduce the need for
human labor, however, most such algorithms are designed for intrinsically discrete
objects such as graphs or strings, and do not work well in this domain, which requires
the ability to deal with real-valued objects arriving in a streaming fashion. In this
work we make two contributions. First, we demonstrate that in many cases a sur-
prisingly small set of human annotated examples are sufficient to perform accurate
classification. Second, we devise a novel parameter-free stopping criterion for semi-
supervised learning. We evaluate our work with a comprehensive set of experiments
on diverse medical data sources including electrocardiograms. Our experimental
results suggest that our approach can typically construct accurate classifiers even if
given only a single annotated instance.
Keywords MDL · Semi-supervised learning · Stopping criterion · Time series
N. Begum (B ) · B. Hu (B ) · T. Rakthanmanon · E. Keogh
Department of Computer Science and Engineering, Kasetsart University, University of California,
Riverside, CA, USA
e-mail: nbegu001@ucr.edu
B. Hu
e-mail: bhu002@ucr.edu
E. Keogh
e-mail: eamonn@cs.ucr.edu
T. Bouabana-Tebibel and S. H. Rubin (eds.), Integration of Reusable Systems, 171
Advances in Intelligent Systems and Computing 263, DOI: 10.1007/978-3-319-04717-1_8,
© Springer International Publishing Switzerland 2014