OJS  Vol.4 No.5 , August 2014
Multivariate Modality Inference Using Gaussian Kernel
Abstract: The number of modes (also known as modality) of a kernel density estimator (KDE) draws lots of interests and is important in practice. In this paper, we develop an inference framework on the modality of a KDE under multivariate setting using Gaussian kernel. We applied the modal clustering method proposed by [1] for mode hunting. A test statistic and its asymptotic distribution are derived to assess the significance of each mode. The inference procedure is applied on both simulated and real data sets.
Cite this paper: Cheng, Y. and Ray, S. (2014) Multivariate Modality Inference Using Gaussian Kernel. Open Journal of Statistics, 4, 419-434. doi: 10.4236/ojs.2014.45041.

[1]   Li, J., Ray, S. and Lindsay, B.G. (2007) A Nonparametric Statistical Approach to Clustering via Mode Identification. Journal of Machine Learning Research, 8, 1687-1723.

[2]   Tibshirani, R., Walther, G. and Hastie, T. (2001) Estimating the Number of Clusters in a Data Set via the Gap Statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63, 411-423.

[3]   McLachlan, G. and Peel, D. (2004) Finite Mixture Models. Wiley, Hoboken.

[4]   Lloyd, S. (1982) Least Squares Quantization in PCM. IEEE Transactions on Information Theory, 28, 129-137.

[5]   Fraley, C. and Raftery, A.E. (2002) Model-Based Clustering, Discriminant Analysis, and Density Estimation. Journal of the American Statistical Association, 97, 611-631.

[6]   Silverman, B.W. (1981) Using Kernel Density Estimates to Investigate Multimodality. Journal of the Royal Statistical Society, Series B (Methodological), 43, 97-99.

[7]   Efron, B. (1979) Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7, 1-26.

[8]   Minnotte, M.C. (1997) Nonparametric Testing of the Existence of Modes. The Annals of Statistics, 25, 1646-1660.

[9]   Burman, P. and Polonik, W. (2009) Multivariate Mode Hunting: Data Analytic Tools with Measures of Significance. Journal of Multivariate Analysis, 100, 1198-1218.

[10]   Fukunaga, K. (1990) Introduction to Statistical Pattern Recognition. Academic Press, Waltham.

[11]   Scott, D.W. (1992) Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley, New York.

[12]   Li, Q. and Racine, J.S. (2011) Nonparametric Econometrics: Theory and Practice. Princeton University Press, Princeton.

[13]   Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39, 1-38.

[14]   Ray, S. and Lindsay, B.G. (2005) The Topography of Multivariate Normal Mixtures. The Annals of Statistics, 33, 2042-2065.

[15]   Dmitrienko, A., Tamhane, A.C. and Bretz, F. (2010) Multiple Testing Problems in Pharmaceutical Statistics. CRC Press, Boca Raton.

[16]   Ray, S. and Pyne, S. (2012) A Computational Framework to Emulate the Human Perspective in Flow Cytometric Data Analysis. PloS One, 7, Article ID: e35693.

[17]   Flury, B. and Riedwyl, H. (1988) Multivariate Statistics: A Practical Approach. Chapman & Hall, Ltd., London.

[18]   Lindsay, B.G., Markatou, M., Ray, S., Yang, K. and Chen, S.C. (2008) Quadratic Distances on Probabilities: A Unified Foundation. The Annals of Statistics, 36, 983-1006.