1 Introduction

In this hands-on, we are going to further validate our model based on the EFA findings. The same data set, “Attitude_Statistics v3.sav” will be used.

The evidence of internal structure will be provided by

  1. Confirmatory factor analysis

    • Model fit
    • Factor loadings
    • Factor correlations (no multicollinearity)
  2. Composite reliability

    • Omega coeffient

2 Preliminaries

2.1 Load libraries

We are going to use psych (Revelle, 2023), lavaan version 0.6.15 (Rosseel, Jorgensen, & Rockwood, 2023), semTools version 0.5.6 (Jorgensen, Pornprasertmanit, Schoemann, & Rosseel, 2022) and semPlot version 1.1.6 (Epskamp, 2022) in our analysis. These packages must be installed from downloaded packages from CRAN at https://cran.r-project.org/.

Make sure you already installed all of them before loading the packages.

library(foreign)
library(psych)
library(lavaan)  # for CFA
library(semTools)  # for additional functions in SEM
library(semPlot)  # for path diagram

2.2 Load data set

We include only good items from PA1 and PA2 in data.cfa data frame.

data = read.spss("Attitude_Statistics v3.sav", F, T)  # Shortform
# Include selected items from PA1 & PA2 in "data.cfa"
data.cfa = data[c("Q4","Q5","Q6","Q7","Q8","Q9","Q10","Q11")]
str(data.cfa)
## 'data.frame':    150 obs. of  8 variables:
##  $ Q4 : num  3 3 1 4 2 3 3 2 4 4 ...
##   ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
##   .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
##  $ Q5 : num  4 4 1 3 5 4 4 3 3 4 ...
##   ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
##   .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
##  $ Q6 : num  4 4 1 2 1 4 3 3 4 5 ...
##   ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
##   .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
##  $ Q7 : num  3 4 1 2 4 4 4 2 3 4 ...
##   ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
##   .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
##  $ Q8 : num  3 3 4 2 5 3 3 3 3 4 ...
##   ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
##   .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
##  $ Q9 : num  3 3 4 2 5 4 3 4 5 4 ...
##   ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
##   .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
##  $ Q10: num  3 3 5 2 3 4 3 4 4 4 ...
##   ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
##   .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
##  $ Q11: num  4 4 1 3 4 4 3 3 4 4 ...
##   ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
##   .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
dim(data.cfa)
## [1] 150   8
names(data.cfa)
## [1] "Q4"  "Q5"  "Q6"  "Q7"  "Q8"  "Q9"  "Q10" "Q11"
head(data.cfa)
##   Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11
## 1  3  4  4  3  3  3   3   4
## 2  3  4  4  4  3  3   3   4
## 3  1  1  1  1  4  4   5   1
## 4  4  3  2  2  2  2   2   3
## 5  2  5  1  4  5  5   3   4
## 6  3  4  4  4  3  4   4   4

3 Confirmatory factor analysis

3.1 Preliminary steps

Descriptive statistics

Check minimum/maximum values per item, and screen for any missing values,

describe(data.cfa)
##     vars   n mean   sd median trimmed  mad min max range  skew kurtosis   se
## Q4     1 150 2.81 1.17      3    2.78 1.48   1   5     4  0.19    -0.81 0.10
## Q5     2 150 3.31 1.01      3    3.32 1.48   1   5     4 -0.22    -0.48 0.08
## Q6     3 150 3.05 1.09      3    3.05 1.48   1   5     4 -0.04    -0.71 0.09
## Q7     4 150 2.92 1.19      3    2.92 1.48   1   5     4 -0.04    -1.06 0.10
## Q8     5 150 3.33 1.00      3    3.34 1.48   1   5     4 -0.08    -0.12 0.08
## Q9     6 150 3.44 1.05      3    3.48 1.48   1   5     4 -0.21    -0.32 0.09
## Q10    7 150 3.31 1.10      3    3.36 1.48   1   5     4 -0.22    -0.39 0.09
## Q11    8 150 3.35 0.94      3    3.37 1.48   1   5     4 -0.31    -0.33 0.08

Note that all n = 150, no missing values. minmax cover the whole range of response options.

% of response to options per item,

response.frequencies(data.cfa)
##         1     2    3    4     5 miss
## Q4  0.140 0.280 0.30 0.19 0.093    0
## Q5  0.040 0.167 0.35 0.33 0.113    0
## Q6  0.080 0.233 0.33 0.26 0.093    0
## Q7  0.133 0.267 0.23 0.29 0.080    0
## Q8  0.047 0.100 0.48 0.23 0.147    0
## Q9  0.047 0.093 0.42 0.25 0.187    0
## Q10 0.073 0.107 0.42 0.23 0.167    0
## Q11 0.027 0.153 0.35 0.39 0.087    0

All response options are used, and there are no missing values.

Multivariate normality

This is done to check the multivariate normality of the data. If the data are normally distributed, we may use maximum likelihood (ML) estimation method for the CFA. In lavaan, we have a number of alternative estimation methods (the full list is available at http://lavaan.ugent.be/tutorial/est.html or by typing ?lavOptions). Two common alternatives are:

  1. MLR (robust ML), suitable for complete and incomplete, non-normal data (Rosseel et al., 2023).
  2. WLSMV (robust weighted least squares), suitable for categorical response options (e.g. dichotomous, polytomous, ordinal (Brown, 2015))
mardia(data.cfa)
## Call: mardia(x = data.cfa)
## 
## Mardia tests of multivariate skew and kurtosis
## Use describe(x) the to get univariate tests
## n.obs = 150   num.vars =  8 
## b1p =  11.37   skew =  284.27  with probability  <=  2.3e-15
##  small sample skew =  291.24  with probability <=  2.9e-16
## b2p =  96.9   kurtosis =  8.18  with probability <=  2.2e-16

the data are not multivariate normal (kurtosis > 5, P < 0.05). We will use MLR in our analysis.

3.2 Step 1

Specify the measurement model

Specify the measurement model according to lavaan syntax.

model = "
PA1 =~ Q4 + Q5 + Q6 + Q7 + Q11
PA2 =~ Q8 + Q9 + Q10
"

=~ indicates “measured by”, thus the items represent the factor.

By default, lavaan will correlate PA1 and PA2 (i.e. PA1 ~~ PA2), somewhat similar to oblique rotation in EFA. ~~ means “correlation”. We will use ~~ when we add correlated errors later.

3.3 Step 2

Fit the model

Here, we fit the specified model. By default, marker indicator variable approach1 is used in lavaan to scale a factor2. We use MLR as the estimation method.

cfa.model = cfa(model, data = data.cfa, estimator = "MLR")
# cfa.model = cfa(model, data = data.cfa, std.lv = 1)  # factor variance = 1
summary(cfa.model, fit.measures = T, standardized = T)
## lavaan 0.6.15 ended normally after 21 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        17
## 
##   Number of observations                           150
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                                37.063      27.373
##   Degrees of freedom                                19          19
##   P-value (Chi-square)                           0.008       0.096
##   Scaling correction factor                                  1.354
##     Yuan-Bentler correction (Mplus variant)                       
## 
## Model Test Baseline Model:
## 
##   Test statistic                               453.795     325.195
##   Degrees of freedom                                28          28
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.395
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.958       0.972
##   Tucker-Lewis Index (TLI)                       0.937       0.958
##                                                                   
##   Robust Comparative Fit Index (CFI)                         0.973
##   Robust Tucker-Lewis Index (TLI)                            0.960
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1566.019   -1566.019
##   Scaling correction factor                                  1.140
##       for the MLR correction                                      
##   Loglikelihood unrestricted model (H1)      -1547.487   -1547.487
##   Scaling correction factor                                  1.253
##       for the MLR correction                                      
##                                                                   
##   Akaike (AIC)                                3166.037    3166.037
##   Bayesian (BIC)                              3217.218    3217.218
##   Sample-size adjusted Bayesian (SABIC)       3163.416    3163.416
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.080       0.054
##   90 Percent confidence interval - lower         0.040       0.000
##   90 Percent confidence interval - upper         0.118       0.091
##   P-value H_0: RMSEA <= 0.050                    0.098       0.397
##   P-value H_0: RMSEA >= 0.080                    0.527       0.133
##                                                                   
##   Robust RMSEA                                               0.063
##   90 Percent confidence interval - lower                     0.000
##   90 Percent confidence interval - upper                     0.112
##   P-value H_0: Robust RMSEA <= 0.050                         0.312
##   P-value H_0: Robust RMSEA >= 0.080                         0.320
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.079       0.079
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   PA1 =~                                                                
##     Q4                1.000                               0.952    0.814
##     Q5                0.660    0.092    7.218    0.000    0.629    0.624
##     Q6                0.810    0.090    9.010    0.000    0.771    0.708
##     Q7                0.916    0.086   10.641    0.000    0.872    0.735
##     Q11               0.533    0.093    5.719    0.000    0.507    0.544
##   PA2 =~                                                                
##     Q8                1.000                               0.653    0.655
##     Q9                1.347    0.156    8.654    0.000    0.880    0.844
##     Q10               1.436    0.199    7.206    0.000    0.938    0.856
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   PA1 ~~                                                                
##     PA2               0.077    0.075    1.035    0.301    0.124    0.124
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Q4                0.460    0.089    5.182    0.000    0.460    0.337
##    .Q5                0.620    0.086    7.178    0.000    0.620    0.611
##    .Q6                0.590    0.101    5.836    0.000    0.590    0.498
##    .Q7                0.647    0.125    5.181    0.000    0.647    0.460
##    .Q11               0.611    0.077    7.934    0.000    0.611    0.704
##    .Q8                0.567    0.101    5.628    0.000    0.567    0.570
##    .Q9                0.312    0.094    3.325    0.001    0.312    0.287
##    .Q10               0.321    0.106    3.046    0.002    0.321    0.267
##     PA1               0.906    0.137    6.587    0.000    1.000    1.000
##     PA2               0.427    0.106    4.009    0.000    1.000    1.000

Results

Read the results marked as Scaled. These represent the results of MLR.

To interpret the results, we must looks at

  1. Overall model fit - by fit indices
  2. Localized areas of misfit
    • Residuals
    • Modification indices
  3. Parameter estimates
    • Factor loadings (Std.all column under Latent Variables table)
    • Factor correlations (Std.all column under Covariances table)

1. Fit indices

The following are a number of selected fit indices and the recommended cut-off values (Brown, 2015; Schreiber, Nora, Stage, Barlow, & King, 2006),

Category Fit index Cut-off
Absolute fit \(\chi^2\) \(P>0.05\)
Standardized root mean square (SRMR) \(\leq 0.08\)
Parsimony correction Root mean square error of and its 90% CI \(\leq 0.08\),
approximation (RMSEA) CFit \(P>0.05\) (\(H_0\leq 0.05\))
Comparative fit Comparative fit index (CFI) \(\geq 0.95\)
Tucker-Lewis index (TLI)

2. Localized areas of misfit (Brown, 2015)

  • Residuals

Residuals are the difference between the values in the sample and model-implied variance-covariance matrices.

Standardized residuals (SRs) > |2.58| indicate the standardized discrepancy between the matrices.

  • Modification indices (MIs)

A modification index indicates the expected parameter change if we include a particular specification in the model (i.e. a constrained/fixed parameter is freely estimated, e.g. by correlating between errors of Q1 and Q2).

Specifications with MIs > 3.84 should be investigated.

3. Parameter estimates

  • Factor loadings (FLs) (Std.all column under Latent Variables table).

The guideline for EFA is applicable also to CFA. For example, FLs \(\geq\) 0.5 are practically significant. In addition, the P-values of the FLs must be significant (at \(\alpha\) = 0.05).

Also look for out-of-range values. FLs should be in range of 0 to 1 (absolute values), thus values > 1 are called Heywood cases or offending estimates (Brown, 2015)

  • Factor correlations (Std.all column under Covariances table).

Similar to EFA, a factor correlation must be < 0.85, which indicates that the factors are distinct. A correlation > 0.85 indicates multicollinearity problem. Also look for out-of-range values. Factor correlations should be in range of 0 to 1 (absolute values).

In addition, when a model has Heywood cases, the solution is not acceptable. The variance-covariance matrix (of our data) could be non-positive definite i.e. the matrix is not invertible for the analysis.

In our output:

Fit indices,

##   Estimator                                         ML 
##   Optimization method                           NLMINB 
##   Number of model parameters                        17 
##  
##   Number of observations                           150 
##  Model Test User Model: 
##                                               Standard      Scaled 
##   Test Statistic                                37.063      27.373 
##   Degrees of freedom                                19          19 
##   P-value (Chi-square)                           0.008       0.096 
##   Scaling correction factor                                  1.354 
##     Yuan-Bentler correction (Mplus variant)                        
##  Model Test Baseline Model:  
##   Test statistic                               453.795     325.195 
##   Degrees of freedom                                28          28 
##   P-value                                        0.000       0.000 
##   Scaling correction factor                                  1.395 
##  
## User Model versus Baseline Model: 
##  
##   Comparative Fit Index (CFI)                    0.958       0.972 
##   Tucker-Lewis Index (TLI)                       0.937       0.958 
##                                                                    
##   Robust Comparative Fit Index (CFI)                         0.973 
##   Robust Tucker-Lewis Index (TLI)                            0.960 
##  
## Loglikelihood and Information Criteria: 
##  
##   Loglikelihood user model (H0)              -1566.019   -1566.019 
##   Scaling correction factor                                  1.140 
##       for the MLR correction                                       
##   Loglikelihood unrestricted model (H1)      -1547.487   -1547.487 
##   Scaling correction factor                                  1.253 
##       for the MLR correction                                       
##                                                                    
##   Akaike (AIC)                                3166.037    3166.037 
##   Bayesian (BIC)                              3217.218    3217.218 
##   Sample-size adjusted Bayesian (SABIC)       3163.416    3163.416 
##  
## Root Mean Square Error of Approximation: 
##  
##   RMSEA                                          0.080       0.054 
##   90 Percent confidence interval - lower         0.040       0.000 
##   90 Percent confidence interval - upper         0.118       0.091 
##   P-value H_0: RMSEA <= 0.050                    0.098       0.397 
##   P-value H_0: RMSEA >= 0.080                    0.527       0.133 
##                                                                    
##   Robust RMSEA                                               0.063 
##   90 Percent confidence interval - lower                     0.000 
##   90 Percent confidence interval - upper                     0.112 
##   P-value H_0: Robust RMSEA <= 0.050                         0.312 
##   P-value H_0: Robust RMSEA >= 0.080                         0.320 
##  
## Standardized Root Mean Square Residual: 
##  
##   SRMR                                           0.079       0.079

The model has good model fit based on all indices, with the exception of the upper 90% CI of robust RMSEA = 0.112. CFit P-value for robust RMSEA = 0.312, which means we cannot reject \(H_0 \leq 0.050\).

FLs and factor correlation,

##  Parameter Estimates:  
##   Standard errors                             Sandwich 
##   Information bread                           Observed 
##   Observed information based on                Hessian 
##  Latent Variables: 
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all 
##   PA1 =~                                                                 
##     Q4                1.000                               0.952    0.814 
##     Q5                0.660    0.092    7.218    0.000    0.629    0.624 
##     Q6                0.810    0.090    9.010    0.000    0.771    0.708 
##     Q7                0.916    0.086   10.641    0.000    0.872    0.735 
##     Q11               0.533    0.093    5.719    0.000    0.507    0.544 
##   PA2 =~                                                                 
##     Q8                1.000                               0.653    0.655 
##     Q9                1.347    0.156    8.654    0.000    0.880    0.844 
##     Q10               1.436    0.199    7.206    0.000    0.938    0.856 
##  Covariances: 
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all 
##   PA1 ~~                                                                 
##     PA2               0.077    0.075    1.035    0.301    0.124    0.124 
##  Variances: 
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all 
##    .Q4                0.460    0.089    5.182    0.000    0.460    0.337 
##    .Q5                0.620    0.086    7.178    0.000    0.620    0.611 
##    .Q6                0.590    0.101    5.836    0.000    0.590    0.498 
##    .Q7                0.647    0.125    5.181    0.000    0.647    0.460 
##    .Q11               0.611    0.077    7.934    0.000    0.611    0.704 
##    .Q8                0.56