Notes on statistics, R and coding: Fitting a single prediction rule ensemble on multiple datasets (e.g., Multiple Imputation)

Multiple imputation (Schafer & Graham, 2002) is the current state-of-the-art for dealing with missing values. It requires creating and analysing several datasets.

Prediction rule ensembles (PREs) are a recent statistical learning method. It takes a boosted, bagged or random-forest decision tree ensemble, and select only the nodes that contribute most to predictive accuracy. This yields predictive accuracy competitive with state-of-the-art statistical learning methods, while being generally much easier to interpret (Fokkema, 2020; Fokkema & Strobl, 2020). They can be fit using R package pre.

A single prediction rule ensemble can be fitted on multiple datasets at once by aggregating the PREs fitted on each of the datasets. Here are functions to fit the PREs and aggregate the results on multiply-imputed data.

References

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357.

Fokkema, M. (2020). Fitting prediction rule ensembles with R package pre. Journal of Statistical Software, 92(12), 1-30. http://doi.org/10.18637/jss.v092.i12

Fokkema, M. Strobl, C. (2020). Fitting prediction rule ensembles to psychological research data: An introduction and tutorial. Psychological Methods 25(5), 636–652. http://doi.org/10.1037/met0000256 https://arxiv.org/abs/1907.05302

Schafer, J. L., & Graham, J. W. (2002). Missing data: our view of the state of the art. Psychological Methods, 7(2), 147.

Notes on statistics, R and coding

Sunday, February 28, 2021

Fitting a single prediction rule ensemble on multiple datasets (e.g., Multiple Imputation)

No comments:

Post a Comment