Notes on statistics, R and coding: Small sample df's with multiple imputation

When Multiple Imputation (MI) is used in SPSS, output of subsequent analyses of datasets may show huge df-values.

Barnard and Rubin (1999) suggested an adjustment of df-values. Abstract:

"An appealing feature of multiple imputation is the simplicity of the rules for combining the multiple complete-data inferences into a final inference, the repeated-imputation inference (Rubin, 1987). This inference is based on a t distribution and is derived from a Bayesian paradigm under the assumption that the complete-data degrees of freedom, \nu_{com}, are infinite, but the number of imputations, m, is finite. When \nu_{com} is small and there is only a modest proportion of missing data, the calculated repeated-imputation degrees of freedom, \nu_{m}, for the t reference distribution can be much larger than \nu_{com}, which is clearly inappropriate. Following the Bayesian paradigm, we derive an adjusted degrees of freedom, \tilde{\nu_{m}} with the following three properties: for fixed m and estimated fraction of missing information, \tilde{\nu_{m}} monotonically increases in \nu_{com}; \tilde{\nu_{m}} is always less than or equal to \nu_{com}; and\tilde{\nu_{m}} equals \nu_{m} when \nu_{com} is infinite. A small simulation study demonstrates the superior frequentist performance when using \tilde{\nu_{m}} , rather than \nu_{m}."

Formulae

\nu_{com} is complete data df's

m is number of imputations

\nu_{m} is the repeated imputation df's

\tilde{\nu_{m}} is always less than or equal to \nu_{com}

\tilde\nu_{m} = \nu_{m} * ( 1 + \nu_{m}) / (\hat{\nu_{obs}} )^(-1)

in which:

\hat{\nu_{obs}} = \lambda (\nu_{com}) * \nu{com} * (1- \hat{\gamma_{m}} )

\lambda (\nu) = (\nu+1) / (nu+3)

\hat{\gamma_{m}} is approximately the Bayesian fraction of missing information for the unknown quantity of interest. Hard to calculate by hand. An SPSS macro can be found here.

Barnard, J. and Rubin, D.B. (1999). Small-Sample Degrees of Freedom with Multiple Imputation. Biometrika, 86, 4, 948-955.

See also: Van Ginkel, J. R., & Van der Ark, L. A. (2005). SPSS syntax for missing value imputation in test and questionnaire data. Applied Psychological Measurement, 29, 152-153.

Notes on statistics, R and coding

Tuesday, August 2, 2011

Small sample df's with multiple imputation

No comments:

Post a Comment