Back
 AM  Vol.13 No.2 , February 2022
Construction and Update of an Online Ensemble Score Involving Linear Discriminant Analysis and Logistic Regression
Abstract: The present aim is to update, upon arrival of new learning data, the parameters of a score constructed with an ensemble method involving linear discriminant analysis and logistic regression in an online setting, without the need to store all of the previously obtained data. Poisson bootstrap and stochastic approximation processes were used with online standardized data to avoid numerical explosions, the convergence of which has been established theoretically. This empirical convergence of online ensemble scores to a reference “batch” score was studied on five different datasets from which data streams were simulated, comparing six different processes to construct the online scores. For each score, 50 replications using a total of 10N observations (N being the size of the dataset) were performed to assess the convergence and the stability of the method, computing the mean and standard deviation of a convergence criterion. A complementary study using 100N observations was also performed. All tested processes on all datasets converged after N iterations, except for one process on one dataset. The best processes were averaged processes using online standardized data and a piecewise constant step-size.
Cite this paper: Lalloué, B. , Monnez, J. , Albuisson, E. , (2022) Construction and Update of an Online Ensemble Score Involving Linear Discriminant Analysis and Logistic Regression. Applied Mathematics, 13, 228-242. doi: 10.4236/am.2022.132018.
References

[1]   Genuer, R. and Poggi, J.M. (2017) Arbres CART et Forêts aléatoires, Importance et sélection de variables.
https://hal.archives-ouvertes.fr/hal-01387654

[2]   Breiman, L. (1996) Bagging Predictors. Machine Learning, 24,123-140.
https://doi.org/10.1007/BF00058655

[3]   Freund, Y. and Schapire, R.E. (1996) Experiments with a New Boosting Algorithm. Proceedings of the Thirteenth International Conference on Machine Learning, 148-156.

[4]   Song, L., Langfelder, P. and Horvath, S. (2013) Random Generalized Linear Model: A Highly Accurate and Interpretable Ensemble Predictor. BMC Bioinformatics, 14, Article No. 5.
https://doi.org/10.1186/1471-2105-14-5

[5]   Duarte, K., Monnez, J.M. and Albuisson, E. (2018) Methodology for Constructing a Short-Term Event Risk Score in Heart Failure Patients. Applied Mathematics, 9, 954-974.
https://doi.org/10.4236/am.2018.98065

[6]   Lalloué, B., Monnez, J.M., Lucci, D. and Albuisson, E. (2021) Construction of Parsimonious Event Risk Scores by an Ensemble Method. An Illustration for Short-Term Predictions in Chronic Heart Failure Patients from the GISSI-HF Trial. Applied Mathematics, 12, 627-653.
https://doi.org/10.4236/am.2021.127045

[7]   Ljung, L., Pflug, G.C. and Walk, H. (1992) Stochastic Approximation and Optimization of Random Systems. Birkhäuser, Basel.
https://doi.org/10.1007/978-3-0348-8609-3

[8]   Xu, W. (2011) Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent. ArXiv11072490 Cs.

[9]   Kek, S.L., Sim, S.Y, Leong, W.J. and Teo, K.L. (2018) Discrete-Time Nonlinear Stochastic Optimal Control Problem Based on Stochastic Approximation Approach. Advances in Pure Mathematics, 8, 232-244.
https://doi.org/10.4236/apm.2018.83012

[10]   Duarte, K., Monnez, J.M. and Albuisson, E. (2018) Sequential Linear Regression with Online Standardized Data. PLoS ONE, 13, e0191186.
https://doi.org/10.1371/journal.pone.0191186

[11]   Lalloué, B., Monnez, J.M. and Albuisson, E. (2021) Streaming Constrained Binary Logistic Regression with Online Standardized Data. Journal of Applied Statistics.
https://doi.org/10.1080/02664763.2020.1870672

[12]   Bach, F. (2014) Adaptivity of Averaged Stochastic Gradient Descent to Local Strong Convexity for Logistic Regression. Journal of Machine Learning Research, 15, 595-627.

[13]   Lalloué, B., Monnez, J.M. and Albuisson, E. (2019) Actualisation en ligne d’un score d’ensemble. 51e Journées de Statistique, Nancy, France, Jun 2019.
https://hal.archives-ouvertes.fr/hal-02152352

[14]   Oza, N.C. and Russell, S.J. (2001) Online Bagging and Boosting. Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, Key West, Florida, USA, 4-7 January 2001, 229-236.

[15]   Cardot, H., Cénac, P. and Monnez, J.M. (2012) A Fast and Recursive Algorithm for Clustering Large Datasets with κ-Medians. Computational Statistics & Data Analysis, 56, 1434-1449.
https://doi.org/10.1016/j.csda.2011.11.019

[16]   Monnez, J.M. and Skiredj, A. (2021) Widening the Scope of an Eigenvector Stochastic Approximation Process and Application to Streaming PCA and Related Methods. Journal of Multivariate Analysis, 182, Article ID: 104694.
https://doi.org/10.1016/j.jmva.2020.104694

[17]   Pitt, B., Remme, W., Zannad, F., Neaton, J., Martinez, F., Roniker, B., et al. (2003) Eplerenone, a Selective Aldosterone Blocker, in Patients with Left Ventricular Dysfunction after Myocardial Infarction. The New England Journal of Medicine, 348, 1309-1321.
https://doi.org/10.1056/NEJMoa030207

[18]   Breiman, L. (1996) Bias, Variance, and Arcing Classifiers. Technical Report 460, University of California, Berkeley.

 
 
Top