Step 1: Explore your dataset to generate hypotheses
For example, we may have a dataset of 100 observations on 10 variables that are actually uncorrelated. To illustrate, let’s generate such a dataset:set.seed(1)
data <- matrix(rnorm(1000), nrow = 100, ncol = 10)
colnames(data) <- paste("x", 1:10, sep = "")library(corrplot)
corrplot(cor(data), method = "number", type = "upper")We got at least one meaningful correlation there: x4 and x9 show a correlation \(>.2\). Now let’s perform some statistical testing to see if some are correlations significant:
library(Hmisc)
pvalues <- rcorr(data)$P
diag(pvalues) <- 1
corrplot(1-pvalues, type = "upper", method = "number", main = "1 minus p-value")Two out of 45 correlation coefficients are significant. Surely there must be a publication in these data! Also, we could decide to test one- instead of two-sided, then we have 3 significant associations. But what would be much better than 3 significant correlations? Yes, indeed: a mediation effect!
Step 2: Perform confirmatory analyses on the same dataset
Example 1: Mediation Analysis
Obviously, we hypothesized before gathering our data that x10 is a causal variable, with a partially indirect effect on x9, via x4.lm1 <- lm(x9 ~ x10, data = data.frame(data))
summary(lm1)## 
## Call:
## lm(formula = x9 ~ x10, data = data.frame(data))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.46497 -0.74698  0.03043  0.64968  2.86824 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 0.009241   0.107194   0.086   0.9315  
## x10         0.205740   0.102135   2.014   0.0467 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.072 on 98 degrees of freedom
## Multiple R-squared:  0.03976,    Adjusted R-squared:  0.02996 
## F-statistic: 4.058 on 1 and 98 DF,  p-value: 0.04671lm2 <- lm(x9 ~ x4 + x10, data = data.frame(data))
summary(lm2)## 
## Call:
## lm(formula = x9 ~ x4 + x10, data = data.frame(data))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.36641 -0.62453  0.04468  0.62737  2.84705 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.001822   0.105719  -0.017   0.9863  
## x4           0.216306   0.107765   2.007   0.0475 *
## x10          0.181501   0.101315   1.791   0.0763 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.056 on 97 degrees of freedom
## Multiple R-squared:  0.07805,    Adjusted R-squared:  0.05904 
## F-statistic: 4.106 on 2 and 97 DF,  p-value: 0.01942Example 2: Factor Analysis
Confirmatory factor analysis (CFA) offers a great way to generate a lot of outcomes at once: There will always be a parameter estimate, significance test or fit index that you can use to back up any interpretation you prefer. As we have no clue what we are looking for in our data, let’s explore the data first using PCA, which will provide us with the hypotheses we need to subsequently fit a CFA on the same data:library(psych)
VSS.scree(data)The scree plot indicates that five components have an eigenvalue $ > 1 $, providing justification for a 5-factor solution. However, such a solution is not very parsimonious for 10 items, so let’s use the elbow criterion and hypothesize that there are 2 factors in our data.
Of course, a better way to go about exploring the factorial structure of the data would be to perform parallel analysis, which would have shown that we would have obtained a very similar scree plot using simulated or resampled data:
fa.parallel(data)## Parallel analysis suggests that the number of factors =  0  and the number of components =  0principal(data, nfactors = 2, rotate = "varimax")## Principal Components Analysis
## Call: principal(r = data, nfactors = 2, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
##       RC1   RC2    h2   u2 com
## x1   0.07  0.48 0.235 0.76 1.0
## x2  -0.07  0.28 0.082 0.92 1.1
## x3  -0.08 -0.27 0.080 0.92 1.2
## x4   0.64 -0.05 0.408 0.59 1.0
## x5  -0.05  0.52 0.272 0.73 1.0
## x6   0.02 -0.49 0.244 0.76 1.0
## x7  -0.31  0.28 0.176 0.82 2.0
## x8   0.00  0.62 0.382 0.62 1.0
## x9   0.68  0.00 0.456 0.54 1.0
## x10  0.65  0.20 0.461 0.54 1.2
## 
##                        RC1  RC2
## SS loadings           1.40 1.40
## Proportion Var        0.14 0.14
## Cumulative Var        0.14 0.28
## Proportion Explained  0.50 0.50
## Cumulative Proportion 0.50 1.00
## 
## Mean item complexity =  1.2
## Test of the hypothesis that 2 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.13 
##  with the empirical chi square  153.01  with prob <  5.9e-20 
## 
## Fit based upon off diagonal values = -0.88This is exactly the structure we expected! So let’s use this structure to perform the CFA:
library(lavaan)
model <- '
  F1 =~ x4 + x9 + x10 + x7
  F2 =~ x1 + x5 + x6 + x8 + x7
'
fit <- cfa(model, data)
fitmeasures(fit, c("chisq", "df", "pvalue", "cfi", "srmr", "rmsea"))##  chisq     df pvalue    cfi   srmr  rmsea 
## 13.777 18.000  0.744  1.000  0.053  0.000summary(fit, standardized = TRUE)## lavaan (0.5-23.1097) converged normally after  61 iterations
## 
##   Number of observations                           100
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               13.777
##   Degrees of freedom                                18
##   P-value (Chi-square)                           0.744
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   F1 =~                                                                 
##     x4                1.000                               0.427    0.433
##     x9                1.226    0.861    1.423    0.155    0.524    0.484
##     x10               0.898    0.589    1.523    0.128    0.384    0.366
##     x7               -0.418    0.450   -0.929    0.353   -0.179   -0.166
##   F2 =~                                                                 
##     x1                1.000                               0.294    0.329
##     x5                1.426    1.229    1.160    0.246    0.419    0.360
##     x6               -0.997    0.882   -1.130    0.258   -0.293   -0.305
##     x8                1.146    1.011    1.134    0.257    0.337    0.309
##     x7                0.742    0.805    0.922    0.356    0.218    0.203
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   F1 ~~                                                                 
##     F2                0.012    0.035    0.334    0.738    0.094    0.094
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .x4                0.791    0.176    4.483    0.000    0.791    0.812
##    .x9                0.898    0.240    3.746    0.000    0.898    0.766
##    .x10               0.954    0.176    5.425    0.000    0.954    0.866
##    .x7                1.084    0.174    6.223    0.000    1.084    0.938
##    .x1                0.712    0.133    5.362    0.000    0.712    0.892
##    .x5                1.179    0.237    4.969    0.000    1.179    0.870
##    .x6                0.838    0.149    5.634    0.000    0.838    0.907
##    .x8                1.078    0.193    5.594    0.000    1.078    0.905
##     F1                0.183    0.158    1.154    0.249    1.000    1.000
##     F2                0.086    0.101    0.860    0.390    1.000    1.000 
No comments:
Post a Comment