I generated 100 datasets with 6 items, measuring two second-order factors, and one first-order factor:
library(lavaan) set.seed(642) rdata <- list() for (i in 1:100) { # generate first-order factor: FO_factor <- rnorm(580) # generate second-order factors: SO_factor1 <- FO_factor + rnorm(580, sd = 1) SO_factor2 <- FO_factor + rnorm(580, sd = 1) # generate the observed data: rdata[[i]] <- data.frame(X1 = SO_factor1 + rnorm(580, sd = .75), X2 = SO_factor1 + rnorm(580, sd = .75), X3 = SO_factor1 + rnorm(580, sd = .75), X4 = SO_factor2 + rnorm(580, sd = .75), X5 = SO_factor2 + rnorm(580, sd = .75), X6 = SO_factor2 + rnorm(580, sd = .75)) }
Then I created a unidimensional model, a model with two correlated factors, a second-order factor model, and two bifactor models to fit on the datasets:
library(semPlot) F1.mod <- ' F =~ X1 + X2 + X3 + X4 + X5 + X6 ' semPaths(F1.mod) F2.mod <- ' F1 =~ X1 + X2 + X3 F2 =~ X4 + X5 + X6 ' semPaths(F2.mod) SO.mod <- ' F1 =~ X1 + X2 + X3 F2 =~ X4 + X5 + X6 F =~ a*F1 + a*F2 ' semPaths(SO.mod)
BF.mod <- ' G =~ X1 + X2 + X3 + X4 + X5 + X6 F1 =~ X1 + X2 + X3 F2 =~ X4 + X5 + X6 ' semPaths(BF.mod)
BF2.mod <- ' G =~ X1 + X2 + X3 + X4 + X5 + X6 F1 =~ X1 + X2 + X3 ' semPaths(BF2.mod)Then I fitted the models to each of the datasets and calculated p-values (of the chi-square statistic), CFIs and RMSEAs:
F1.fit <- list() F2.fit <- list() SO.fit <- list() BF.fit <- list() BF2.fit <- list() cfis <- list() rmseas <- list() pvals <- list() results <- data.frame(dataset = rep(1:100, each = 5), pval = NA, cfi = NA, rmsea = NA, mod = rep(1:5, times = 100)) fit.ind <- c("pvalue", "cfi", "rmsea") for (i in 1:100) { F1.fit[[i]] <- cfa(F1.mod, data = rdata[[i]], std.lv = TRUE) F2.fit[[i]] <- cfa(F2.mod, data = rdata[[i]], std.lv = TRUE) SO.fit[[i]] <- cfa(SO.mod, data = rdata[[i]], std.lv = TRUE) BF.fit[[i]] <- cfa(BF.mod, data = rdata[[i]], orthogonal = TRUE, std.lv = TRUE) BF2.fit[[i]] <- cfa(BF2.mod, data = rdata[[i]], orthogonal = TRUE, std.lv = TRUE) results[(i-1)*5+1, 2:4] <- fitmeasures(F1.fit[[i]], fit.ind) results[(i-1)*5+2, 2:4] <- fitmeasures(F2.fit[[i]], fit.ind) results[(i-1)*5+3, 2:4] <- fitmeasures(SO.fit[[i]], fit.ind) results[(i-1)*5+4, 2:4] <- fitmeasures(BF.fit[[i]], fit.ind) results[(i-1)*5+5, 2:4] <- fitmeasures(BF2.fit[[i]], fit.ind) } results$mod = rep(c("one-factor", "two-factor", "second-order", "bifactor", "bifactor2"), times = 100) head(results) aggregate(results[2:4], by = list(results$mod), FUN = mean, na.rm = TRUE)
Not surprisingly, there are a lot of warnings for the first bifactor model, because it may not be identified. We should take the fit indices for that model with a grain of salt.
On average, p-values, RMSEA and CFI indicate bad fit for the one-factor model, as they should. The second-order and two correlated-factor models fit the data well, on average, according to the p-value, CFI and RMSEA. Note that these models are equivalent models, so should fit the data identical.
The most striking findings is that the second bifactor model always seems to fit the data better than the second-order factor model, which was the model that generated the data!
How about the variances of the fit indices, do they differ much between models?
aggregate(results[2:4], by = list(results$mod), FUN = sd, na.rm = TRUE)
They don't seem to differ much. If anything, the variance of the p-values seems comparable between bifactor and second-order factor models. The variance of the CFI is lower for bifactor models, and the variance of RMSEA is lower for the second-order factor model.
Conclusion
There is not much sense in comparing the fit between second-order and bifactor models if we want to know which model generated the data. A bifactor model will often fit the data better, even when the data were generated using a second-order factor model. I am curious whether this is also the case with more indicators per factor.
No comments:
Post a Comment