PSYCH  Vol.5 No.18 , November 2014
Predicting Academic Achievement of High-School Students Using Machine Learning
ABSTRACT
The present paper presents a relatively new non-linear method to predict academic achievement of high school students, integrating the fields of psychometrics and machine learning. A sample composed by 135 high-school students (10th grade, 50.34% boys), aged between 14 and 19 years old (M = 15.44, DP = 1.09), answered to three psychological instruments: the Inductive Reasoning Developmental Test (TDRI), the Metacognitive Control Test (TCM) and the Brazilian Learning Approaches Scale (BLAS-Deep Approach). The first two tests have a self-appraisal scale attached, so we have five independent variables. The students’ responses to each test/scale were analyzed using the Rasch model. A subset of the original sample was created in order to separate the students in two balanced classes, high achievement (n = 41) and low achievement (n = 47), using grades from nine school subjects. In order to predict the class membership a machine learning non-linear model named Random Forest was used. The subset with the two classes was randomly split into two sets (training and testing) for cross validation. The result of the Random Forest showed a general accuracy of 75%, a specificity of 73.69% and a sensitivity of 68% in the training set. In the testing set, the general accuracy was 68.18%, with a specificity of 63.63% and with a sensitivity of 72.72%. The most important variable in the prediction was the TDRI. Finally, implications of the present study to the field of educational psychology were discussed.

Cite this paper
Golino, H. , Gomes, C. & Andrade, D. (2014). Predicting Academic Achievement of High-School Students Using Machine Learning. Psychology, 5, 2046-2057. doi: 10.4236/psych.2014.518207.
References
[1]   Baca-Garcia, E., Perez-Rodriguez, M., Saiz-Gonzalez, D., Basurte-Villamor, I., Saiz-Ruiz, J., Leiva-Murillo, J. M., & de Leon, J. (2007). Variables Associated with Familial Suicide Attempts in a Sample of Suicide Attempters. Progress in Neuro-Psychopharmacology & Biological Psychiatry, 31, 1312-1316.
http://dx.doi.org/10.1016/j.pnpbp.2007.05.019

[2]   Blanch, A., & Aluja, A. (2013). A Regression Tree of the Aptitudes, Personality, and Academic Performance Relationship. Personality and Individual Differences, 54, 703-708.
http://dx.doi.org/10.1016/j.paid.2012.11.032

[3]   Breiman, L. (2001a). Random Forests. Machine Learning, 1, 5-32.
http://dx.doi.org/10.1023/A:1010933404324

[4]   Breiman, L. (2001b). Bagging Predictors. Machine Learning, 24, 123-140.
http://dx.doi.org/10.1007/BF00058655

[5]   Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and Regression Trees. New York: Chapman & Hall.

[6]   Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.), Hillsdale, NJ: Lawrence Erlbaum Associates.

[7]   Commons, M. L., & Richards, F. A. (1984). Applying the General Stage Model. In M. L. Commons, F. A. Richards, & C. Armon (Eds.), Beyond Formal Operations. Late Adolescent and Adult Cognitive Development: Late Adolescent and Adult Cognitive Development (Vol. 1, pp. 141-157). New York: Praeger.

[8]   Commons, M. L. (2008). Introduction to the Model of Hierarchical Complexity and Its Relationship to Postformal Action. World Futures, 64, 305-320.
http://dx.doi.org/10.1080/02604020802301105

[9]   Commons, M. L., & Pekker, A. (2008). Presenting the Formal Theory of Hierarchical Complexity. World Futures, 64, 375-382.
http://dx.doi.org/10.1080/02604020802301204

[10]   Cortez, P., & Silva, A. M. G. (2008). Using Data Mining to Predict Secondary School Student Performance. In A. Brito, & J. Teixeira (Eds.), Proceedings of 5th Annual Future Business Technology Conference, Porto, 5-12.

[11]   Del Re, A. C. (2013). compute.es: Compute Effect Sizes. R Package Version 0.2-2.
http://cran.r-project.org/web/packages/compute.es

[12]   Eloyan, A., Muschelli, J., Nebel, M., Liu, H., Han, F., Zhao, T., Caffo, B. et al. (2012). Automated Diagnoses of Attention Deficit Hyperactive Disorder Using Magnetic Resonance Imaging. Frontiers in Systems Neuroscience, 6, 61.
http://dx.doi.org/10.3389/fnsys.2012.00061

[13]   Fischer, K. W. (1980). A Theory of Cognitive Development: The Control and Construction of Hierarchies of Skills. Psychological Review, 87, 477-531.
http://dx.doi.org/10.1037/0033-295X.87.6.477

[14]   Fischer, K. W., & Yan, Z. (2002). The Development of Dynamic Skill Theory. In R. Lickliter, & D. Lewkowicz (Eds.), Conceptions of Development: Lessons from the Laboratory. Hove: Psychology Press.

[15]   Flach, P. (2012). Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge: Cambridge University Press.
http://dx.doi.org/10.1017/CBO9780511973000

[16]   Frederick, S. (2005). Cognitive Reflection and Decision Making. Journal of Economic Perspectives, 19, 25-42.
http://dx.doi.org/10.1257/089533005775196732

[17]   Geurts, P., Irrthum, A., & Wehenkel, L. (2009). Supervised Learning with Decision Tree-Based Methods in Computational and Systems Biology. Molecular BioSystems, 5, 1593-1605.
http://dx.doi.org/10.1039/b907946g

[18]   Gibbons, R. D., Hooker, G., Finkelman, M. D., Weiss, D. J., Pilkonis, P. A., Frank, E., Moore, T., & Kupfer, D. J. (2013). The Computerized Adaptive Diagnostic Test for Major Depressive Disorder (CAD-MDD): A Screening Tool for Depression. Journal of Clinical Psychiatry, 74, 669-674.
http://dx.doi.org/10.4088/JCP.12m08338

[19]   Golino, H. F., & Gomes, C. M. A. (2012). The Structural Validity of the Inductive Reasoning Developmental Test for the Measurement of Developmental Stages. In K. Stalne (Chair), Adult Development: Past, Present and New Agendas of Research, Symposium Conducted at the Meeting of the European Society for Research on Adult Development, Coimbra, 7-8 July 2012.

[20]   Golino, H. F., & Gomes, C. M. A. (2013). Controlando pensamentos intuitivos: O que o pao de queijo e o café podem dizer sobre a forma como pensamos. In C. M. A. Gomes (Chair), Neuroeconomia e Neuromarketing, Symposium conducted at the VII Simposio de Neurociencias da Universidade Federal de Minas Gerais, Belo Horizonte.

[21]   Golino, H. F., & Gomes, C. M. A. (2014). Four Machine Learning Methods to Predict Academic Achievement of College Students: A Comparison Study. Revista E-PSI, 4, 68-101.

[22]   Gomes, C. M. A., & Golino, H. F. (2009). Estudo exploratorio sobre o Teste de Desenvolvimento do Raciocinio Indutivo (TDRI). In D. Colinvaux (Ed.), Anais do VII Congresso Brasileiro de Psicologia do Desenvolvimento: Desenvolvimento e Direitos Humananos (pp. 77-79). Rio de Janeiro: UERJ.
http://www.abpd.psc.br/files/congressosAnteriores/AnaisVIICBPD.pdf

[23]   Gomes, C. M. A. (2010). Perfis de estudantes e a relacao entre abordagens de aprendizagem e rendimento Escolar. Psico, 41, 503-509.

[24]   Gomes, C. M. A., & Golino, H. F. (2012). Validade incremental da Escala de Abordagens de Aprendizagem. Psicologia: Reflexao e Critica, 25, 623-633.
http://dx.doi.org/10.1590/S0102-79722012000400001

[25]   Gomes, C. M. A., Golino, H. F., Pinheiro, C. A. R., Miranda, G. R., & Soares, J. M. T. (2011). Validacao da Escala de Abordagens de Aprendizagem (EABAP) em uma amostra brasileira. Psicologia: Reflexao e Critica, 24, 19-27.
http://dx.doi.org/10.1590/S0102-79722011000100004

[26]   Hardman, J., Paucar-Caceres, A., & Fielding, A. (2013). Predicting Students’ Progression in Higher Education by Using the Random Forest Algorithm. Systems Research and Behavioral Science, 30, 194-203.
http://dx.doi.org/10.1002/sres.2130

[27]   Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference and Prediction (2nd ed.). New York: Springer.
http://dx.doi.org/10.1007/978-0-387-84858-7

[28]   James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. New York: Springer.
http://dx.doi.org/10.1007/978-1-4614-7138-7

[29]   Kuroki, Y., & Tilley, J. L. (2012). Recursive Partitioning Analysis of Lifetime Suicidal Behaviors in Asian Americans. Asian American Journal of Psychology, 3, 17-28.
http://dx.doi.org/10.1037/a0026586

[30]   Liaw, A., & Wiener, M. (2012). Random Forest: Breiman and Cutler’s Random Forests for Classification and Regression. R Package Version 4.6-7.
http://cran.r-project.org/web/packages/randomForest/

[31]   Linacre, J. M. (2012). Winsteps® Rasch Measurement Computer Program. Beaverton, OR: Winsteps.com.

[32]   McGraw, K. O., & Wong, S. P. (1992). A Common Language Effect Size Statistic. Psychological Bulletin, 111, 361-365.
http://dx.doi.org/10.1037/0033-2909.111.2.361

[33]   Scott, S. B., Jackson, B. R., & Bergeman, C. S. (2011). What Contributes to Perceived Stress in Later Life? A Recursive Partitioning Approach. Psychology and Aging, 26, 830-843.
http://dx.doi.org/10.1037/a0023180

[34]   Skogli, E., Teicher, M. H., Andersen, P., Hovik, K., & Øie, M. (2013). ADHD in Girls and Boys—Gender Differences in Co-Existing Symptoms and Executive Function Measures. BMC Psychiatry, 13, 298.
http://dx.doi.org/10.1186/1471-244X-13-298

[35]   Tian, F., Gao, P., Li, L., Zhang, W., Liang, H., Qian, Y., & Zhao, R. (2014). Recognizing and Regulating e-Learners’ Emotions Based on Interactive Chinese Texts in e-Learning Systems. Knowledge-Based Systems, 55, 148-164.
http://dx.doi.org/10.1016/j.knosys.2013.10.019

[36]   van der Wal, C., & Kowalczyk, W. (2013). Detecting Changing Emotions in Human Speech by Machine and Humans. Applied Intelligence, 39, 675-691.
http://dx.doi.org/10.1007/s10489-013-0449-1

 
 
Top