Detecting naturally arising structures
in data is central to knowledge extraction from data. In most applications, the main
challenge is in the choice of the appropriate model for exploring the data
features. The choice is generally poorly understood and any tentative choice
may be too restrictive. Growing
volumes of data, disparate data sources and modelling techniques entail the need for model
optimization via adaptability rather than comparability. We propose a novel
two-stage algorithm to modelling continuous data consisting of an unsupervised stage whereby the algorithm
searches through the data for optimal parameter values and a supervised stage that adapts the parameters for
predictive modelling. The
method is implemented on the sunspots data with inherently Gaussian
distributional properties and assumed bi-modality. Optimal values separating
high from lows cycles are obtained via multiple simulations. Early patterns for
each recorded cycle reveal that the first 3 years provide a sufficient basis
for predicting the peak. Multiple Support Vector Machine runs using repeatedly
improved data parameters show that the approach yields greater accuracy and
reliability than conventional approaches and provides a good basis for model
selection. Model reliability is established via multiple simulations of this
Cite this paper
K. Mwitondi, J. Bugrien and K. Wang, "Using Optimized Distributional Parameters as Inputs in a Sequential Unsupervised and Supervised Modeling of Sunspots Data," Journal of Software Engineering and Applications
, Vol. 6 No. 7, 2013, pp. 34-41. doi: 10.4236/jsea.2013.67B007
 J. Bugrien, K. Mwitondi and F. Shuweihdi (2013). A Kernel Density Smoothing Method for Determining an Optimal Number of Clusters in Continuous Data; The 16th International Conference on Computational Methods and Experimental Measurements; 2 - 4 July, 2013, A Coru?a, Spain.
 A. R. Choudhuri, P. Chatterjee and J. Jiang (2007). Predicting Solar Cycle 24 with a Solar Dynamo Model; Physical Review Letters, Vol. 98, No. 13, American Phys. Society.
 A. Cuevas, M. Febrero and R. Fraiman, “Estimating the Number of Clusters,” The Canadian Journal of Statistics, Vol. 28, No. 2, pp. 367-382. doi:10.2307/3315985
 Cortes and Vapnik, “Support-vector Networks; Machine Learning,” Vol. 20, No. 3, pp. 273-297, Kluwer Academic Publishers. doi:10.1007/BF00994018
 A. P. Dempster, N. M. Laird and D. B. Rubin, “Maximum Like-lihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Vol. 39, 1977, pp. 1-38.
 D. Hand, H. Mannila and P. Smyth, Principles of Data Mining (Adaptive Computation and Machine Learning); A Bradford Book; ISBN-13: 978-0262082907.
 R. P. Kane, “Solar Cycle Predictions Based on Extrapolation of Spectral Components: An Update,” A Journal for Solar and Solar-Stellar Research and the Study of Solar Terrestrial Physics, Vol. 246, No. 2, 2007, pp. 487-493.
 Kitiashvili, I. and Kosovichev, A. (2009). Prediction of solar magnetic cycles by a data assimilation method; Cosmic Magnetic Fields: From Planets, to Stars and Galaxies; Proceedings IAU Symposium, No. 259, Edited by Strassmeier, K, Kosovichev, A. and Beckman, J. (2009) - International Astronomical Union.
 G. McLachlan T. Krishnan, (1996). The EM Algorithm and Extensions; John Wiley.
 K. Mwitondi, R. Said and A. Yousif, “A Sequential Data Mining Method for Modelling Solar Magnetic Cycles,” Neural Information Processing, LNCS, Vol. 7663, pp 296-304, Springer 2012.
 NOOA (2012). http://www.ngdc.noaa.gov
 E. Pohtila, (1980). Climatic Fluctuations and Forestry in Lapland; Ecography, Vol. 3, No. 2, pp 65-136, ISSN: 1600-0587.
 R. Pielke, R. Avissar, M. Raupach, A. Dolman, X. Zeng and A. Denning, (1998). Interactions between the atmosphere and terrestrial ecosystems: Influence on weather and climate; Global Change Biology, Vol. 4, No. 5, pp. 461-475.
 R. Qahwaji and T. Colak, (2007). Automatic Short-Term Solar Flare Prediction Using Machine Learning and Sunspot Associations; SOLAR PHYSICS, Vol. 241, No. 1, pp. 195-211.
 R (2011). R Version 2.13.0 for Windows; R Foundation for Statistical Computing.
 D. Reames, Magnetic topology of impulsive and gradual solar energetic particle events; The Astrophysical Journal, Vol. 571, 2002, pp 63-66. doi:10.1086/341149
 S. J. Roberts, “Parametric and Non-parametric Unsupervised Cluster Analysis,” Pattern Recognition, Vol. 30, No. 5, 1997, pp. 261-272.
 M. J. Rycroft, S. Israelsson and C. Price, “The Global Atmospheric Electric Circuit, Solar Activity and Climate Change,” Journal of Atmospheric and Solar-Terrestrial Physics, Vol. 62, No. 17-18, 2000, pp. 1563-1576.
 S. H. Schwabe,. AstronomischeNachrich-ten, Vol. 20, No. 495, 1843, pp. 234-235.
 G. L. Siscoe, Solar–terrestrial Influences on Weather and Climate; Climatology Supplement, Nature, Vol. 276, pp. 348-352.
 B. W. Silverman, Using Kernel Density Estimates to Investigate Multimodality, Journal of the Royal Statistical Society, B, 43, 1981, pp 97-99.
 J. R. Wolf, New studies of the period of Suns-pots and their meanings; Communications of Natural History; Society in Bern, 255,1852, pp 249-270.