OJS  Vol.4 No.8 , September 2014
A Fully Bayesian Sparse Probit Model for Text Categorization
Abstract: Nowadays a common problem when processing data sets with the large number of covariates compared to small sample sizes (fat data sets) is to estimate the parameters associated with each covariate. When the number of covariates far exceeds the number of samples, the parameter estimation becomes very difficult. Researchers in many fields such as text categorization deal with the burden of finding and estimating important covariates without overfitting the model. In this study, we developed a Sparse Probit Bayesian Model (SPBM) based on Gibbs sampling which utilizes double exponentials prior to induce shrinkage and reduce the number of covariates in the model. The method was evaluated using ten domains such as mathematics, the corpuses of which were downloaded from Wikipedia. From the downloaded corpuses, we created the TFIDF matrix corresponding to all domains and divided the whole data set randomly into training and testing groups of size 300. To make the model more robust we performed 50 re-samplings on selection of training and test groups. The model was implemented in R and the Gibbs sampler ran for 60 k iterations and the first 20 k was discarded as burn in. We performed classification on training and test groups by calculating P (yi = 1) and according to [1] [2] the threshold of 0.5 was used as decision rule. Our model’s performance was compared to Support Vector Machines (SVM) using average sensitivity and specificity across 50 runs. The SPBM achieved high classification accuracy and outperformed SVM in almost all domains analyzed.
Cite this paper: Madahian, B. and Faghihi, U. (2014) A Fully Bayesian Sparse Probit Model for Text Categorization. Open Journal of Statistics, 4, 611-619. doi: 10.4236/ojs.2014.48057.

[1]   Pike, M., et al. (1980) Bias and Efficiency in Logistic Analyses of Stratified Case-Control Studies. International Journal of Epidemiology, 9, 89-95.

[2]   Genkin, A., Lewis, D.D. and Madigan, D. (2007) Large-Scale Bayesian Logistic Regression for Text Categorization. Technometrics, 49, 291-304.

[3]   Cao, J. and Zhang, S. (2010) Measuring Statistical Significance for Full Bayesian Methods in Microarray Analyses. Bayesian Analysis, 5, 413-427.

[4]   Li, J., et al. (2011) The Bayesian Lasso for Genome-Wide Association Studies. Bioinformatics, 27, 516-523.

[5]   Bae, K. and Mallick, B.K. (2004) Gene Selection Using a Two-Level Hierarchical Bayesian Model. Bioinformatics, 20, 3423-3430.

[6]   Tibshirani, R. (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society Series B, 58, 267-288.

[7]   Madahian, B., Deng, L.Y. and Homayouni, R. (2014) Application of Sparse Bayesian Generalized Linear Model to Gene Expression Data for Classification of Prostate Cancer Subtypes. Open Journal of Statistics, 4, 518-526.

[8]   Wu, T.T., et al. (2009) Genome-Wide Association Analysis by Lasso Penalized Logistic Regression. Bioinformatics, 25, 714-721.

[9]   Yang, J., et al. (2010) Common SNPs Explain a Large Proportion of the Heritability for Human Height. Nature Reviews Genetics, 42, 565-569.

[10]   Madsen, H. and Thyregod, P. (2011) Introduction to General and Generalized Linear Models. Chapman & Hall/CRC, Boca Raton.

[11]   Gelfand, A. and Smith, A.F.M. (1990) Sampling-Based Approaches to Calculating Marginal Densities. Journal of the American Statistical Association, 85, 398-409.

[12]   Gilks, W.R., Richardson, S. and Spiegelhalter, D. (1995) Markov Chain Monte Carlo in Practice. Chapman and Hall/CRC, London.

[13]   Leopold, E. and Kindermann, J. (2002) Text Categorization with Support Vector Machines. How to Represent Texts in InPut Space? Machine Learning, 46, 423-444.

[14]   Kim, H., Howland, P. and Park, H. (2005) Dimension Reduction in Text Classification with Support Vector Machines. Journal of Machine Learning Research, 6, 37-53.

[15]   Joachims, T. (1998) Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Springer, Berlin Heidelberg.

[16]   Guyon, I., et al. (2002) Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning, 46, 389-422.

[17]   Weston, J., et al. (2002) Feature Selection for SVMs. Advances in Neural Information Processing Systems. MIT Press, Cambridge.

[18]   Blei, D.M. (2012) Probabilistic Topic Models. Communications of the ACM, 55, 77-84.

[19]   Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003) Latent Dirichlet Allocation. The Journal of Machine Learning Research, 3, 993-1022.

[20]   Schmidt, B. (2013) Sapping Attention: Keeping the Words in Topic Models.

[21]   Weingart, S.B. (2012) Topic Modeling for Humanists: A Guided Tour.

[22]   Wedderburn, R.W.M. (1974) Quasi-Likelihood Functions, Generalized Linear Models, and the Gauss-Newton Method. Biometrika, 61, 439-447.

[23]   Jennrich, R.I. and Sampson, P.F. (1976) Newton-Raphson and Related Algorithms for Maximum Likelihood Variance Component Estimation. Technometrics, 18, 11-17.

[24]   Hastie, T., Tibshirani, R. and Friedman, J. (2009) Linear Methods for Regression. Springer, New York.

[25]   Hoerl, A.E. and Kennard, R.W. (1970) Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12, 55-67.

[26]   Li, Z. and Sillanpää, M.J. (2012) Overview of LASSO-Related Penalized Regression Methods for Quantitative Trait Mapping and Genomic Selection. Theoretical and Applied Genetics, 125, 419-435.

[27]   Knight, K. and Fu, W. (2000) Asymptotics for Lasso-Type Estimators. The Annals of Statistics, 28, 1356-1378.

[28]   Yuan, M. and Lin, Y. (2005) Efficient Empirical Bayes Variable Selection and Estimation in Linear Models. Journal of the American Statistical Association, 100, 1215-1225.

[29]   Zou, H. (2006) The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association, 101, 1418- 1429.

[30]   Zou, H. and Li, R. (2008) One-Step Sparse Estimates in Non-Concave Penalized Likelihood Models. The Annals of Statistics, 36, 1509-1533.

[31]   Park, T. and Casella, G. (2008) The Bayesian Lasso. Journal of the American Statistical Association, 103, 681-686.

[32]   Hans, C. (2009) Bayesian Lasso Regression. Biometrika, 96, 835-845.

[33]   Griffin, J.E. and Brown,P.J. (2007) Bayesian Adaptive Lassos with Non-Convex Penalization. Technical Report, IMSAS, University of Kent, Canterbury.

[34]   Albert, J. and Chib, S. (1993) Bayesian Analysis of Binary and Polychotomous Response Data. Journal of the American Statistical Association, 88, 669-679.

[35]   Bae, K. and Mallick, B.K. (2004) Gene Selection Using a Two-Level Hierarchical Bayesian Model. Bioinformatics, 20, 3423-3430.

[36]   Chen, J., et al. (2006) Decision Threshold Adjustment in Class Prediction. SAR and QSAR in Environmental Research, 17, 337-352.

[37]   Altman, D.G. and Bland, J.M. (1994) Diagnostic Tests 1: Sensitivity and Specificity. British Medical Journal, 308, 1552.

[38]   Karatzoglou, A., Meyer, D. and Hornik, K. (2005) Support Vector Machines in R. Journal of Statistical Software, 15, 1-28.

[39]   Karatzoglou, A., et al. (2004) Kernlab—An S4 Package for Kernel Methods in R. Journal of Statistical Software, 11, 1-20.

[40]   Williams, P.M. (1995) Bayesian Regularization and Pruning Using a Laplace Prior. Neural Computation, 7, 117-143.