P.K. Rai et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.10, October- 2014, pg. 595-604
© 2014, IJCSMC All Rights Reserved 595
Available Online at www.ijcsmc.com
International Journal of Computer Science and Mobile Computing
A Monthly Journal of Computer Science and Information Technology
ISSN 2320–088X
IJCSMC, Vol. 3, Issue. 10, October 2014, pg.595 – 604
RESEARCH ARTICLE
Unsupervised Learning on Cosmic Ray
Daily Harmonic Variations
Roopesh K. Dwivedi, P.K. Rai*
A.P. S. University Rewa (M.P.)-India
* pkrapsu@gmail.com
Abstract: Clustering is division of data into groups of similar objects. From a machine learning perspective
cluster correspond to hidden patterns. In unsupervised learning we find cluster to represent a data concept.
Since scientific organizations also generate large volumes of data, the challenges are to analyze the data using
the recent data mining techniques, so as to arrive at meaningful conclusions. For real life applications, we have
used the hourly cosmic ray intensity data from 1965 to 2006 to first derive for each day, the amplitude and phase
of the harmonics of the daily variation (r
1
,
1
, and r
2,
2
). We have applied the k-mean partitioning algorithm, the
agglomerative hierarchical clustering algorithm BIRCH, and the density based partitioning algorithm DBSCAN
on the above set of daily data containing r
1
,
1
, and r
2,
2
for each day. Many interesting clusters have been
identified. The cluster analysis indicates that a very clear-cut 10-11 year periodicity is observed in the harmonics
dataset even when all the four attributes are considered together. Moreover, similar characteristics are repeated
after a gap of 10-11 years and many years occurring in pairs in the two sets (out of the 4 sets, each of about 10-11
years) are the outlier years. The years 1996 and 1997 are particularly emphasized as outliers. These results are
similar to that reported in literature, though by statistical methods and by considering only r
1
and
1
and not all
the four attributes taken together. As such the superiority of the mining technique is revealed in the real life
situations.
Key Words: Clustering, Data mining, K-mean, BIRCH, DBSCAN, Cosmic ray harmonic
1. Introduction
The process of grouping a set of physical or abstract object is called clustering [JMF99].
A cluster is a collection of data objects that are similar to one another within the same cluster and
are dissimilar to the objects in other clusters [D93] [E93]. As a branch of statistics, cluster
analysis has been studied extensively for many years. In cluster analysis main focus is on
distance based cluster analysis [M96]. Many statistical analysis software packages or systems
have built in feature for cluster analysis and they are being used as cluster analysis tools. These