1 Practical

In this practical session, we are going to confirm our model based on the EFA findings. The same data set, “Attitude_Statistics v3.sav” will be used. View the original questionnaire “Attitude_Statistics v3 Questions.pdf”

We will focus on:

Model fit by fit indices
Localized areas of misfit
Parameter estimates: Factor loadings and correlations
Composite reliability Omega

2 Preliminaries

2.1 Load libraries

We are going to use psych (Revelle, 2023), lavaan version 0.6.16 (Rosseel, Jorgensen, & Rockwood, 2023), semTools version 0.5.6 (Jorgensen, Pornprasertmanit, Schoemann, & Rosseel, 2022) and semPlot version 1.1.6 (Epskamp, 2022) in our analysis. These packages must be installed from downloaded packages from CRAN at https://cran.r-project.org/.

Make sure you already installed all of them before loading the packages.

library(foreign)
library(psych)
library(lavaan)  # for CFA
library(semTools)  # for additional functions in SEM
library(semPlot)  # for path diagram

2.2 Load data set

We include only good items from PA1 and PA2 in data.cfa data frame.

data = read.spss("Attitude_Statistics v3.sav", F, T)  # Shortform
# Include selected items from PA1 & PA2 in "data.cfa"
data.cfa = data[c("Q4","Q5","Q6","Q7","Q8","Q9","Q10","Q11")]
str(data.cfa)

'data.frame':   150 obs. of  8 variables:
 $ Q4 : num  3 3 1 4 2 3 3 2 4 4 ...
  ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
  .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
 $ Q5 : num  4 4 1 3 5 4 4 3 3 4 ...
  ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
  .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
 $ Q6 : num  4 4 1 2 1 4 3 3 4 5 ...
  ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
  .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
 $ Q7 : num  3 4 1 2 4 4 4 2 3 4 ...
  ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
  .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
 $ Q8 : num  3 3 4 2 5 3 3 3 3 4 ...
  ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
  .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
 $ Q9 : num  3 3 4 2 5 4 3 4 5 4 ...
  ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
  .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
 $ Q10: num  3 3 5 2 3 4 3 4 4 4 ...
  ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
  .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
 $ Q11: num  4 4 1 3 4 4 3 3 4 4 ...
  ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
  .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...

dim(data.cfa)

[1] 150   8

names(data.cfa)

[1] "Q4"  "Q5"  "Q6"  "Q7"  "Q8"  "Q9"  "Q10" "Q11"

head(data.cfa)

  Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11
1  3  4  4  3  3  3   3   4
2  3  4  4  4  3  3   3   4
3  1  1  1  1  4  4   5   1
4  4  3  2  2  2  2   2   3
5  2  5  1  4  5  5   3   4
6  3  4  4  4  3  4   4   4

3 Confirmatory factor analysis

3.1 Preliminary steps

Descriptive statistics

Check minimum/maximum values per item, and screen for any missing values,

describe(data.cfa)

    vars   n mean   sd median trimmed  mad min max range  skew kurtosis   se
Q4     1 150 2.81 1.17      3    2.78 1.48   1   5     4  0.19    -0.81 0.10
Q5     2 150 3.31 1.01      3    3.32 1.48   1   5     4 -0.22    -0.48 0.08
Q6     3 150 3.05 1.09      3    3.05 1.48   1   5     4 -0.04    -0.71 0.09
Q7     4 150 2.92 1.19      3    2.92 1.48   1   5     4 -0.04    -1.06 0.10
Q8     5 150 3.33 1.00      3    3.34 1.48   1   5     4 -0.08    -0.12 0.08
Q9     6 150 3.44 1.05      3    3.48 1.48   1   5     4 -0.21    -0.32 0.09
Q10    7 150 3.31 1.10      3    3.36 1.48   1   5     4 -0.22    -0.39 0.09
Q11    8 150 3.35 0.94      3    3.37 1.48   1   5     4 -0.31    -0.33 0.08

Note that all n = 150, no missing values. min–max cover the whole range of response options.

% of response to options per item,

response.frequencies(data.cfa)

        1     2    3    4     5 miss
Q4  0.140 0.280 0.30 0.19 0.093    0
Q5  0.040 0.167 0.35 0.33 0.113    0
Q6  0.080 0.233 0.33 0.26 0.093    0
Q7  0.133 0.267 0.23 0.29 0.080    0
Q8  0.047 0.100 0.48 0.23 0.147    0
Q9  0.047 0.093 0.42 0.25 0.187    0
Q10 0.073 0.107 0.42 0.23 0.167    0
Q11 0.027 0.153 0.35 0.39 0.087    0

All response options are used, and there are no missing values.

Multivariate normality

This is done to check the multivariate normality of the data. If the data are normally distributed, we may use maximum likelihood (ML) estimation method for the CFA. In lavaan, we have a number of alternative estimation methods (the full list is available at http://lavaan.ugent.be/tutorial/est.html or by typing ?lavOptions). Two common alternatives are:

MLR (robust ML), suitable for complete and incomplete, non-normal data (Rosseel et al., 2023).
WLSMV (robust weighted least squares), suitable for categorical response options (e.g. dichotomous, polytomous, ordinal (Brown, 2015))

mardia(data.cfa)

Call: mardia(x = data.cfa)

Mardia tests of multivariate skew and kurtosis
Use describe(x) the to get univariate tests
n.obs = 150   num.vars =  8 
b1p =  11.37   skew =  284.27  with probability  <=  2.3e-15
 small sample skew =  291.24  with probability <=  2.9e-16
b2p =  96.9   kurtosis =  8.18  with probability <=  2.2e-16

the data are not multivariate normal (kurtosis > 5, P < 0.05). We will use MLR in our analysis.

3.2 Step 1

Specify the measurement model

Specify the measurement model according to lavaan syntax.

model = "
PA1 =~ Q4 + Q5 + Q6 + Q7 + Q11
PA2 =~ Q8 + Q9 + Q10
"

=~ indicates “measured by”, thus the items represent the factor.

By default, lavaan will correlate PA1 and PA2 (i.e. PA1 ~~ PA2), somewhat similar to oblique rotation in EFA. ~~ means “correlation”. We will use ~~ when we add correlated errors later.

3.3 Step 2

Fit the model

Here, we fit the specified model. By default, marker indicator variable approach¹ is used in lavaan to scale a factor². We use MLR as the estimation method.

cfa.model = cfa(model, data = data.cfa, estimator = "MLR")
# cfa.model = cfa(model, data = data.cfa, std.lv = 1)  # factor variance = 1
summary(cfa.model, fit.measures = T, standardized = T)

lavaan 0.6.16 ended normally after 21 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        17

  Number of observations                           150

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                                37.063      27.373
  Degrees of freedom                                19          19
  P-value (Chi-square)                           0.008       0.096
  Scaling correction factor                                  1.354
    Yuan-Bentler correction (Mplus variant)                       

Model Test Baseline Model:

  Test statistic                               453.795     325.195
  Degrees of freedom                                28          28
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.395

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.958       0.972
  Tucker-Lewis Index (TLI)                       0.937       0.958
                                                                  
  Robust Comparative Fit Index (CFI)                         0.973
  Robust Tucker-Lewis Index (TLI)                            0.960

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -1566.019   -1566.019
  Scaling correction factor                                  1.140
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)      -1547.487   -1547.487
  Scaling correction factor                                  1.253
      for the MLR correction                                      
                                                                  
  Akaike (AIC)                                3166.037    3166.037
  Bayesian (BIC)                              3217.218    3217.218
  Sample-size adjusted Bayesian (SABIC)       3163.416    3163.416

Root Mean Square Error of Approximation:

  RMSEA                                          0.080       0.054
  90 Percent confidence interval - lower         0.040       0.000
  90 Percent confidence interval - upper         0.118       0.091
  P-value H_0: RMSEA <= 0.050                    0.098       0.397
  P-value H_0: RMSEA >= 0.080                    0.527       0.133
                                                                  
  Robust RMSEA                                               0.063
  90 Percent confidence interval - lower                     0.000
  90 Percent confidence interval - upper                     0.112
  P-value H_0: Robust RMSEA <= 0.050                         0.312
  P-value H_0: Robust RMSEA >= 0.080                         0.320

Standardized Root Mean Square Residual:

  SRMR                                           0.079       0.079

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  PA1 =~                                                                
    Q4                1.000                               0.952    0.814
    Q5                0.660    0.092    7.218    0.000    0.629    0.624
    Q6                0.810    0.090    9.010    0.000    0.771    0.708
    Q7                0.916    0.086   10.641    0.000    0.872    0.735
    Q11               0.533    0.093    5.719    0.000    0.507    0.544
  PA2 =~                                                                
    Q8                1.000                               0.653    0.655
    Q9                1.347    0.156    8.654    0.000    0.880    0.844
    Q10               1.436    0.199    7.206    0.000    0.938    0.856

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  PA1 ~~                                                                
    PA2               0.077    0.075    1.035    0.301    0.124    0.124

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Q4                0.460    0.089    5.182    0.000    0.460    0.337
   .Q5                0.620    0.086    7.178    0.000    0.620    0.611
   .Q6                0.590    0.101    5.836    0.000    0.590    0.498
   .Q7                0.647    0.125    5.181    0.000    0.647    0.460
   .Q11               0.611    0.077    7.934    0.000    0.611    0.704
   .Q8                0.567    0.101    5.628    0.000    0.567    0.570
   .Q9                0.312    0.094    3.325    0.001    0.312    0.287
   .Q10               0.321    0.106    3.046    0.002    0.321    0.267
    PA1               0.906    0.137    6.587    0.000    1.000    1.000
    PA2               0.427    0.106    4.009    0.000    1.000    1.000

Results

Read the results marked as Scaled. These represent the results of MLR.

To interpret the results, we must looks at

Overall model fit - by fit indices
Localized areas of misfit
- Residuals
- Modification indices
Parameter estimates
- Factor loadings (Std.all column under Latent Variables table)
- Factor correlations (Std.all column under Covariances table)

1. Fit indices

The following are a number of selected fit indices and the recommended cut-off values (Brown, 2015; Schreiber, Nora, Stage, Barlow, & King, 2006),

Category	Fit index	Cut-off
Absolute fit	\(\chi^2\)	\(P>0.05\)
	Standardized root mean square (SRMR)	\(\leq 0.08\)
Parsimony correction	Root mean square error of	and its 90% CI \(\leq 0.08\),
	approximation (RMSEA)	CFit \(P>0.05\) (\(H_0\leq 0.05\))
Comparative fit	Comparative fit index (CFI)	\(\geq 0.95\)
	Tucker-Lewis index (TLI)

2. Localized areas of misfit (Brown, 2015)

Residuals

Residuals are the difference between the values in the sample and model-implied variance-covariance matrices.

Standardized residuals (SRs) > |2.58| indicate the standardized discrepancy between the matrices.

Modification indices (MIs)

A modification index indicates the expected parameter change if we include a particular specification in the model (i.e. a constrained/fixed parameter is freely estimated, e.g. by correlating between errors of Q1 and Q2).

Specifications with MIs > 3.84 should be investigated.

3. Parameter estimates

Factor loadings (FLs) (Std.all column under Latent Variables table).

The guideline for EFA is applicable also to CFA. For example, FLs \(\geq\) 0.5 are practically significant. In addition, the P-values of the FLs must be significant (at \(\alpha\) = 0.05).

Also look for out-of-range values. FLs should be in range of 0 to 1 (absolute values), thus values > 1 are called Heywood cases or offending estimates (Brown, 2015)

Factor correlations (Std.all column under Covariances table).

Similar to EFA, a factor correlation must be < 0.85, which indicates that the factors are distinct. A correlation > 0.85 indicates multicollinearity problem. Also look for out-of-range values. Factor correlations should be in range of 0 to 1 (absolute values).

In addition, when a model has Heywood cases, the solution is not acceptable. The variance-covariance matrix (of our data) could be non-positive definite i.e. the matrix is not invertible for the analysis.

In our output:

Fit indices,

  Estimator                                         ML 
  Optimization method                           NLMINB 
  Number of model parameters                        17 
 
  Number of observations                           150 
 Model Test User Model: 
                                              Standard      Scaled 
  Test Statistic                                37.063      27.373 
  Degrees of freedom                                19          19 
  P-value (Chi-square)                           0.008       0.096 
  Scaling correction factor                                  1.354 
    Yuan-Bentler correction (Mplus variant)                        
 Model Test Baseline Model:  
  Test statistic                               453.795     325.195 
  Degrees of freedom                                28          28 
  P-value                                        0.000       0.000 
  Scaling correction factor                                  1.395 
 
User Model versus Baseline Model: 
 
  Comparative Fit Index (CFI)                    0.958       0.972 
  Tucker-Lewis Index (TLI)                       0.937       0.958 
                                                                   
  Robust Comparative Fit Index (CFI)                         0.973 
  Robust Tucker-Lewis Index (TLI)                            0.960 
 
Loglikelihood and Information Criteria: 
 
  Loglikelihood user model (H0)              -1566.019   -1566.019 
  Scaling correction factor                                  1.140 
      for the MLR correction                                       
  Loglikelihood unrestricted model (H1)      -1547.487   -1547.487 
  Scaling correction factor                                  1.253 
      for the MLR correction                                       
                                                                   
  Akaike (AIC)                                3166.037    3166.037 
  Bayesian (BIC)                              3217.218    3217.218 
  Sample-size adjusted Bayesian (SABIC)       3163.416    3163.416 
 
Root Mean Square Error of Approximation: 
 
  RMSEA                                          0.080       0.054 
  90 Percent confidence interval - lower         0.040       0.000 
  90 Percent confidence interval - upper         0.118       0.091 
  P-value H_0: RMSEA <= 0.050                    0.098       0.397 
  P-value H_0: RMSEA >= 0.080                    0.527       0.133 
                                                                   
  Robust RMSEA                                               0.063 
  90 Percent confidence interval - lower                     0.000 
  90 Percent confidence interval - upper                     0.112 
  P-value H_0: Robust RMSEA <= 0.050                         0.312 
  P-value H_0: Robust RMSEA >= 0.080                         0.320 
 
Standardized Root Mean Square Residual: 
 
  SRMR                                           0.079       0.079

The model has good model fit based on all indices, with the exception of the upper 90% CI of robust RMSEA = 0.112. CFit P-value for robust RMSEA = 0.312, which means we cannot reject \(H_0 \leq 0.050\).

FLs and factor correlation,

 Parameter Estimates:  
  Standard errors                             Sandwich 
  Information bread                           Observed 
  Observed information based on                Hessian 
 Latent Variables: 
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all 
  PA1 =~                                                                 
    Q4                1.000                               0.952    0.814 
    Q5                0.660    0.092    7.218    0.000    0.629    0.624 
    Q6                0.810    0.090    9.010    0.000    0.771    0.708 
    Q7                0.916    0.086   10.641    0.000    0.872    0.735 
    Q11               0.533    0.093    5.719    0.000    0.507    0.544 
  PA2 =~                                                                 
    Q8                1.000                               0.653    0.655 
    Q9                1.347    0.156    8.654    0.000    0.880    0.844 
    Q10               1.436    0.199    7.206    0.000    0.938    0.856 
 Covariances: 
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all 
  PA1 ~~                                                                 
    PA2               0.077    0.075    1.035    0.301    0.124    0.124 
 Variances: 
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all 
   .Q4                0.460    0.089    5.182    0.000    0.460    0.337 
   .Q5                0.620    0.086    7.178    0.000    0.620    0.611 
   .Q6                0.590    0.101    5.836    0.000    0.590    0.498 
   .Q7                0.647    0.125    5.181    0.000    0.647    0.460 
   .Q11               0.611    0.077    7.934    0.000    0.611    0.704 
   .Q8                0.567    0.101    5.628    0.000    0.567    0.570 
   .Q9                0.312    0.094    3.325    0.001    0.312    0.287 
   .Q10               0.321    0.106    3.046    0.002    0.321    0.267 
    PA1               0.906    0.137    6.587    0.000    1.000    1.000 
    PA2               0.427    0.106    4.009    0.000    1.000    1.000

Remember to read the results down the Std.all column. All FLs > 0.5 and the factor correlation < 0.85. There is no problem with the item quality and the factors are distinct.

In addition, to obtain the standardized results with SE,

standardizedSolution(cfa.model)  # standardized, to view the SE of FL

   lhs op rhs est.std    se      z pvalue ci.lower ci.upper
1  PA1 =~  Q4   0.814 0.041 20.017  0.000    0.735    0.894
2  PA1 =~  Q5   0.624 0.068  9.117  0.000    0.490    0.758
3  PA1 =~  Q6   0.708 0.060 11.867  0.000    0.591    0.825
4  PA1 =~  Q7   0.735 0.059 12.523  0.000    0.620    0.850
5  PA1 =~ Q11   0.544 0.080  6.781  0.000    0.387    0.702
6  PA2 =~  Q8   0.655 0.067  9.712  0.000    0.523    0.788
7  PA2 =~  Q9   0.844 0.048 17.646  0.000    0.751    0.938
8  PA2 =~ Q10   0.856 0.048 17.670  0.000    0.761    0.951
9   Q4 ~~  Q4   0.337 0.066  5.078  0.000    0.207    0.467
10  Q5 ~~  Q5   0.611 0.085  7.156  0.000    0.444    0.778
11  Q6 ~~  Q6   0.498 0.085  5.894  0.000    0.333    0.664
12  Q7 ~~  Q7   0.460 0.086  5.324  0.000    0.290    0.629
13 Q11 ~~ Q11   0.704 0.087  8.049  0.000    0.532    0.875
14  Q8 ~~  Q8   0.570 0.088  6.446  0.000    0.397    0.744
15  Q9 ~~  Q9   0.287 0.081  3.550  0.000    0.128    0.445
16 Q10 ~~ Q10   0.267 0.083  3.226  0.001    0.105    0.430
17 PA1 ~~ PA1   1.000 0.000     NA     NA    1.000    1.000
18 PA2 ~~ PA2   1.000 0.000     NA     NA    1.000    1.000
19 PA1 ~~ PA2   0.124 0.113  1.099  0.272   -0.098    0.346

and, to obtain the unstandardized results with 95% CI,

parameterEstimates(cfa.model)  # unstandardized, to view the 95% CI

   lhs op rhs   est    se      z pvalue ci.lower ci.upper
1  PA1 =~  Q4 1.000 0.000     NA     NA    1.000    1.000
2  PA1 =~  Q5 0.660 0.092  7.218  0.000    0.481    0.840
3  PA1 =~  Q6 0.810 0.090  9.010  0.000    0.634    0.986
4  PA1 =~  Q7 0.916 0.086 10.641  0.000    0.748    1.085
5  PA1 =~ Q11 0.533 0.093  5.719  0.000    0.350    0.716
6  PA2 =~  Q8 1.000 0.000     NA     NA    1.000    1.000
7  PA2 =~  Q9 1.347 0.156  8.654  0.000    1.042    1.652
8  PA2 =~ Q10 1.436 0.199  7.206  0.000    1.046    1.827
9   Q4 ~~  Q4 0.460 0.089  5.182  0.000    0.286    0.633
10  Q5 ~~  Q5 0.620 0.086  7.178  0.000    0.451    0.789
11  Q6 ~~  Q6 0.590 0.101  5.836  0.000    0.392    0.788
12  Q7 ~~  Q7 0.647 0.125  5.181  0.000    0.402    0.891
13 Q11 ~~ Q11 0.611 0.077  7.934  0.000    0.460    0.762
14  Q8 ~~  Q8 0.567 0.101  5.628  0.000    0.369    0.764
15  Q9 ~~  Q9 0.312 0.094  3.325  0.001    0.128    0.495
16 Q10 ~~ Q10 0.321 0.106  3.046  0.002    0.115    0.528
17 PA1 ~~ PA1 0.906 0.137  6.587  0.000    0.636    1.175
18 PA2 ~~ PA2 0.427 0.106  4.009  0.000    0.218    0.635
19 PA1 ~~ PA2 0.077 0.075  1.035  0.301   -0.069    0.224

Localized areas of misfit,

modificationIndices(cfa.model, sort = TRUE, minimum.value = 3.84) # find mi > |3.84|

   lhs op rhs     mi   epc sepc.lv sepc.all sepc.nox
20 PA1 =~  Q8 10.264 0.244   0.232    0.233    0.233
55  Q9 ~~ Q10 10.264 2.325   2.325    7.346    7.346
24 PA2 =~  Q5  8.359 0.332   0.217    0.215    0.215
37  Q5 ~~ Q11  6.301 0.144   0.144    0.234    0.234

residuals(cfa.model, type="standardized") # find sr > |2.58|

$type
[1] "standardized"

$cov
        Q4     Q5     Q6     Q7    Q11     Q8     Q9    Q10
Q4   0.000                                                 
Q5   0.007  0.000                                          
Q6  -0.168 -0.369  0.000                                   
Q7   0.238 -1.097  0.710  0.000                            
Q11 -0.079  1.924 -0.945 -0.628  0.000                     
Q8   1.918  3.455  1.638  1.513  1.197  0.000              
Q9  -0.972  2.224  0.141 -1.508  1.092 -0.131  0.000       
Q10 -1.263  1.691 -1.250 -1.586  0.400 -0.047  0.063  0.000

There are four suggested specifications with MIs > 3.84. We may ignore PA1 =~ Q8 and PA2 =~ Q5 based on content, because it is not justifiable to allow these two items specified under other factors. Q9 ~~ Q10 is justifiable, based on the wording “is important”. But Q5 ~~ Q11 is not justifiable.

Q5 has an SR > 2.58 with Q8 (SR = 3.455). So we may focus on Q5. Avoid Q8 because there are only three items in the factor and FL Q8 > Q5.

3.4 Step 3

Whenever the model do not fit well, we must revise the model. To do so, we must look for the causes of the poor fit to the data. The causes in CFA could be:

Item – the item has low FL (< 0.3), is specified to load on wrong factor or has cross-loading issue.
Factor – the factors have multicollinearity problem (correlation > 0.85), or the presence of redundant factors in a model. This can detected by residuals and MIs.
Correlated error (method effect) – some items are similarly worded (e.g. “I like …”, “I believe…”) or have almost similar meaning/content. This is usually detected by residuals and MIs.
Improper solution – the solution with Heywood cases. It could be because the specified model is not supported by the data and the misspecification could be a combination of all the first three causes listed above. A small sample may also lead to improper solution.

The problems might not surface if a proper EFA is done in the first place and the model is theoretically sound.

Model-to-model comparison following revision is done based on:

\(\chi^2\) difference

for nested³ models only.

AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion)

for nested and unnested models.
an improvement in the model is shown as a reduction in AIC and BIC values (Brown, 2015). Better model = Smaller AIC/BIC.

Model revision

Revision 1: Based on MI, Q9 ~~ Q10?

Both from PA2, reasonable by the wording of the questions.

model1 = "
PA1 =~ Q4 + Q5 + Q6 + Q7 + Q11
PA2 =~ Q8 + Q9 + Q10
Q9 ~~ Q10
"
cfa.model1 = cfa(model1, data = data.cfa, estimator = "MLR")

Warning in lav_object_post_check(object): lavaan WARNING: some estimated ov variances are
negative

summary(cfa.model1, fit.measures=T, standardized=T)

Take note of the warning message above! It points out to problem(s) in our model. Here we have negative variance. Remember that variance is the square of standard deviation, thus it is impossible to have a negative variance!

  Estimator                                         ML 
  Optimization method                           NLMINB 
  Number of model parameters                        18 
 
  Number of observations                           150 
 Model Test User Model: 
                                              Standard      Scaled 
  Test Statistic                                26.487      19.771 
  Degrees of freedom                                18          18 
  P-value (Chi-square)                           0.089       0.346 
  Scaling correction factor                                  1.340 
    Yuan-Bentler correction (Mplus variant)                        
 Model Test Baseline Model:  
  Test statistic                               453.795     325.195 
  Degrees of freedom                                28          28 
  P-value                                        0.000       0.000 
  Scaling correction factor                                  1.395 
 
User Model versus Baseline Model: 
 
  Comparative Fit Index (CFI)                    0.980       0.994 
  Tucker-Lewis Index (TLI)                       0.969       0.991 
                                                                   
  Robust Comparative Fit Index (CFI)                         0.994 
  Robust Tucker-Lewis Index (TLI)                            0.991 
 
Loglikelihood and Information Criteria: 
 
  Loglikelihood user model (H0)              -1560.730   -1560.730 
  Scaling correction factor                                  1.166 
      for the MLR correction                                       
  Loglikelihood unrestricted model (H1)      -1547.487   -1547.487 
  Scaling correction factor                                  1.253 
      for the MLR correction                                       
                                                                   
  Akaike (AIC)                                3157.461    3157.461 
  Bayesian (BIC)                              3211.652    3211.652 
  Sample-size adjusted Bayesian (SABIC)       3154.685    3154.685 
 
Root Mean Square Error of Approximation: 
 
  RMSEA                                          0.056       0.026 
  90 Percent confidence interval - lower         0.000       0.000 
  90 Percent confidence interval - upper         0.099       0.073 
  P-value H_0: RMSEA <= 0.050                    0.376       0.754 
  P-value H_0: RMSEA >= 0.080                    0.200       0.024 
                                                                   
  Robust RMSEA                                               0.030 
  90 Percent confidence interval - lower                     0.000 
  90 Percent confidence interval - upper                     0.091 
  P-value H_0: Robust RMSEA <= 0.050                         0.637 
  P-value H_0: Robust RMSEA >= 0.080                         0.105 
 
Standardized Root Mean Square Residual: 
 
  SRMR                                           0.058       0.058

The upper 90% CI of RMSEA is smaller, but

 Parameter Estimates:  
  Standard errors                             Sandwich 
  Information bread                           Observed 
  Observed information based on                Hessian 
 Latent Variables: 
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all 
  PA1 =~                                                                 
    Q4                1.000                               0.950    0.813 
    Q5                0.663    0.089    7.470    0.000    0.630    0.626 
    Q6                0.811    0.091    8.876    0.000    0.771    0.708 
    Q7                0.923    0.089   10.409    0.000    0.877    0.739 
    Q11               0.528    0.092    5.729    0.000    0.502    0.538 
  PA2 =~                                                                 
    Q8                1.000                               1.517    1.522 
    Q9                0.247    0.314    0.786    0.432    0.374    0.359 
    Q10               0.267    0.332    0.803    0.422    0.404    0.369 
 Covariances: 
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all 
 .Q9 ~~                                                                  
   .Q10               0.678    0.198    3.419    0.001    0.678    0.683 
  PA1 ~~                                                                 
    PA2               0.267    0.096    2.787    0.005    0.185    0.185 
 Variances: 
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all 
   .Q4                0.462    0.087    5.329    0.000    0.462    0.339 
   .Q5                0.618    0.084    7.331    0.000    0.618    0.609 
   .Q6                0.590    0.104    5.682    0.000    0.590    0.498 
   .Q7                0.639    0.125    5.101    0.000    0.639    0.454 
   .Q11               0.617    0.077    8.008    0.000    0.617    0.710 
   .Q8               -1.307    2.801   -0.467    0.641   -1.307   -1.316 
   .Q9                0.947    0.198    4.782    0.000    0.947    0.871 
   .Q10               1.038    0.229    4.531    0.000    1.038    0.864 
    PA1               0.903    0.138    6.556    0.000    1.000    1.000 
    PA2               2.300    2.779    0.828    0.408    1.000    1.000 
 NA

we have a serious Heywood case here! Q8 FL = 1.522. Thus this solution is not acceptable.

Revision 2: Remove Q5?

model2 = "
PA1 =~ Q4 + Q6 + Q7 + Q11
PA2 =~ Q8 + Q9 + Q10
"
cfa.model2 = cfa(model2, data = data.cfa, estimator = "MLR")
summary(cfa.model2, fit.measures=T, standardized=T)

  Estimator                                         ML 
  Optimization method                           NLMINB 
  Number of model parameters                        15 
 
  Number of observations                           150 
 Model Test User Model: 
                                              Standard      Scaled 
  Test Statistic                                20.451      14.467 
  Degrees of freedom                                13          13 
  P-value (Chi-square)                           0.085       0.342 
  Scaling correction factor                                  1.414 
    Yuan-Bentler correction (Mplus variant)                        
 Model Test Baseline Model:  
  Test statistic                               380.263     260.751 
  Degrees of freedom                                21          21 
  P-value                                        0.000       0.000 
  Scaling correction factor                                  1.458 
 
User Model versus Baseline Model: 
 
  Comparative Fit Index (CFI)                    0.979       0.994 
  Tucker-Lewis Index (TLI)                       0.966       0.990 
                                                                   
  Robust Comparative Fit Index (CFI)                         0.994 
  Robust Tucker-Lewis Index (TLI)                            0.990 
 
Loglikelihood and Information Criteria: 
 
  Loglikelihood user model (H0)              -1380.510   -1380.510 
  Scaling correction factor                                  1.167 
      for the MLR correction                                       
  Loglikelihood unrestricted model (H1)      -1370.284   -1370.284 
  Scaling correction factor                                  1.281 
      for the MLR correction                                       
                                                                   
  Akaike (AIC)                                2791.020    2791.020 
  Bayesian (BIC)                              2836.179    2836.179 
  Sample-size adjusted Bayesian (SABIC)       2788.707    2788.707 
 
Root Mean Square Error of Approximation: 
 
  RMSEA                                          0.062       0.027 
  90 Percent confidence interval - lower         0.000       0.000 
  90 Percent confidence interval - upper         0.111       0.079 
  P-value H_0: RMSEA <= 0.050                    0.314       0.704 
  P-value H_0: RMSEA >= 0.080                    0.306       0.048 
                                                                   
  Robust RMSEA                                               0.033 
  90 Percent confidence interval - lower                     0.000 
  90 Percent confidence interval - upper                     0.104 
  P-value H_0: Robust RMSEA <= 0.050                         0.579 
  P-value H_0: Robust RMSEA >= 0.080                         0.173 
 
Standardized Root Mean Square Residual: 
 
  SRMR                                           0.063       0.063

The upper 90% CI of RMSEA has reduced from 0.112 to 0.104.

 Parameter Estimates:  
  Standard errors                             Sandwich 
  Information bread                           Observed 
  Observed information based on                Hessian 
 Latent Variables: 
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all 
  PA1 =~                                                                 
    Q4                1.000                               0.941    0.806 
    Q6                0.830    0.099    8.391    0.000    0.781    0.718 
    Q7                0.960    0.100    9.597    0.000    0.904    0.762 
    Q11               0.504    0.091    5.547    0.000    0.474    0.509 
  PA2 =~                                                                 
    Q8                1.000                               0.651    0.653 
    Q9                1.351    0.155    8.692    0.000    0.880    0.844 
    Q10               1.444    0.202    7.143    0.000    0.940    0.858 
 Covariances: 
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all 
  PA1 ~~                                                                 
    PA2               0.048    0.068    0.705    0.481    0.078    0.078 
 Variances: 
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all 
   .Q4                0.479    0.101    4.734    0.000    0.479    0.351 
   .Q6                0.574    0.106    5.418    0.000    0.574    0.485 
   .Q7                0.591    0.134    4.403    0.000    0.591    0.420 
   .Q11               0.644    0.079    8.180    0.000    0.644    0.741 
   .Q8                0.569    0.101    5.657    0.000    0.569    0.573 
   .Q9                0.313    0.096    3.266    0.001    0.313    0.288 
   .Q10               0.318    0.108    2.940    0.003    0.318    0.264 
    PA1               0.886    0.146    6.052    0.000    1.000    1.000 
    PA2               0.424    0.106    4.006    0.000    1.000    1.000

The FLs and factor correlation are acceptable. No Heywood’s case.

modificationIndices(cfa.model2, sort = TRUE, minimum.value = 3.84) # find mi > |3.84|

   lhs op rhs    mi   epc sepc.lv sepc.all sepc.nox
18 PA1 =~  Q8 9.707 0.241   0.227    0.228    0.228
45  Q9 ~~ Q10 9.707 3.661   3.661   11.615   11.615

residuals(cfa.model2, type="standardized") # find sr > |2.58|

$type
[1] "standardized"

$cov
        Q4     Q6     Q7    Q11     Q8     Q9    Q10
Q4   0.000                                          
Q6  -0.225  0.000                                   
Q7  -0.088  0.429  0.000                            
Q11  0.628 -0.412 -0.440  0.000                     
Q8   2.186  1.860  1.767  1.421  0.000              
Q9  -0.485  0.506 -1.207  1.419 -0.154  0.000       
Q10 -0.571 -0.829 -1.159  0.693 -0.076  0.074  0.000

There is no SR > 2.58.

So we may stop at model2, although the upper 90% CI of RMSEA is still > 0.08, but there is no more localized areas of misfit by SR.

Model-to-model comparison

Because model2 is not nested in model, we compare by AIC and BIC,

anova(cfa.model, cfa.model2)

Warning in lavTestLRT(object = object, ..., model.names = NAMES): lavaan WARNING: some
models are based on a different set of observed variables


Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")

lavaan NOTE:
    The "Chisq" column contains standard test statistics, not the
    robust test that should be reported per model. A robust difference
    test is a function of two standard (not robust) statistics.
 
           Df  AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)  
cfa.model2 13 2791 2836.2 20.451                                
cfa.model  19 3166 3217.2 37.063     13.562       6    0.03493 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Clearly, the AIC and BIC are reduced (model2 [without Q5] vs model [with Q5]). Please ignore the \(\chi^2\) difference result.

4 Composite reliability

Omega \(\omega\) coefficient

Composite reliability by omega (\(\omega\)) coefficient is the reliability coefficient applicable to CFA. It takes into account the correlated errors.

Construct reliability \(\geq\) 0.7 (Hair, Black, Babin, & Anderson, 2010) is acceptable.

Obtain the omega,

compRelSEM(cfa.model2)

  PA1   PA2 
0.809 0.836

Composite reliability omega: PA1 = 0.809, PA2 = 0.836. Both factors are reliable.

5 Path diagram

A CFA model can be nicely presented in the form of path diagram.

semPaths(cfa.model2, 'path', 'std', style = 'lisrel', 
         edge.color = 'black')

6 Results presentation

In the report, you must include a number of important statements and results pertaining to the CFA,

The estimation method e.g. ML, MLR, WLSMV etc.
The model specification and the theoretical background supporting the model.
Details about the selected fit indices, residuals, MIs, FLs and factor correlations and the accepted cut-off values.
Detailed comments on the fit and parameters of the tested models. This is usually done in reference to summary tables.
Details about the revision process, i.e. item deletion, addition of correlated errors or any other modifications and the effects on the model fit. Also mention the reasons e.g. high SRs, low Fls etc.
Summary tables, which outlines the model fit indices, model comparison, FLs, communalities, omega, and factor correlations.
The path diagram (most of the time, of the final model). This may be requested by some journals.

Fit indices of the models.

Model	\(\chi^2\)(df)	\(P\)	SRMR	RMSEA	90% CI	CFI	TLI	AIC	BIC
Model	27.4(19)	0.096	0.079	0.063	0.000, 0.112	0.973	0.960	3166	3217
Model 2	14.5(13)	0.342	0.063	0.033	0.000, 0.104	0.994	0.990	2791	2836

Factor loadings and reliability of Model 2.

Factor	Item	Factor loading	Omega
Affinity	Q4	0.806	0.809
	Q6	0.718
	Q7	0.762
	Q11	0.509
Importance	Q8	0.653	0.836
	Q9	0.844
	Q10	0.858

Factor correlation: Affinity \(\leftrightarrow\) Importance, \(r = 0.078\)

The path diagram of Model 2.

References

Brown, T. A. (2015). Confirmatory factor analysis for applied research. New York: The Guilford Press.

Epskamp, S. (2022). semPlot: Path diagrams and visual analysis of various SEM packages’ output. Retrieved from https://github.com/SachaEpskamp/semPlot

Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2010). Multivariate data analysis. New Jersey: Prentice Hall.

Jorgensen, T. D., Pornprasertmanit, S., Schoemann, A. M., & Rosseel, Y. (2022). semTools: Useful tools for structural equation modeling. Retrieved from https://github.com/simsem/semTools/wiki

Muthen, L. K. (2012, May). Setting factor variance to 1 in mplus. Retrieved from http://www.statmodel.com/discussion/messages/9/980.html?1602971916

Revelle, W. (2023). Psych: Procedures for psychological, psychometric, and personality research. Retrieved from https://personality-project.org/r/psych/ https://personality-project.org/r/psych-manual.pdf

Rosseel, Y., Jorgensen, T. D., & Rockwood, N. (2023). Lavaan: Latent variable analysis. Retrieved from https://lavaan.ugent.be

Schreiber, J. B., Nora, A., Stage, F. K., Barlow, E. A., & King, J. (2006). Reporting structural equation modeling and confirmatory factor analysis results: A review. The Journal of Educational Research, 99(6), 323–338.

The regression weight of an item from a factor is fixed to 1. Another approach in CFA is to fix the factor variance to 1 (Brown, 2015). This supposedly gives same model fit (Muthen, 2012)↩︎
The latent variable (factor) is an unobserved variable, thus it has to be scaled by a method to define its metric/unit of measurement. This is done by fixing either the item regression weight or the factor variance to 1.↩︎
model with same number of items, but with different model specifications e.g. number of factors↩︎

Confirmatory factor analysis (practical)

Note updated January 17, 2024

Wan Nor Arifin (wnarifin@usm.my),
Biostatistics and Research Methodology Unit, Universiti Sains Malaysia
Website: wnarifin.github.io

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

1 Practical

2 Preliminaries

2.1 Load libraries

2.2 Load data set

3 Confirmatory factor analysis

3.1 Preliminary steps

3.2 Step 1

3.3 Step 2

3.4 Step 3

4 Composite reliability

5 Path diagram

6 Results presentation

References

Confirmatory factor analysis (practical)

Note updated January 17, 2024

Wan Nor Arifin (wnarifin@usm.my),Biostatistics and Research Methodology Unit, Universiti Sains MalaysiaWebsite: wnarifin.github.io

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

1 Practical

2 Preliminaries

2.1 Load libraries

2.2 Load data set

3 Confirmatory factor analysis

3.1 Preliminary steps

3.2 Step 1

3.3 Step 2

3.4 Step 3

4 Composite reliability

5 Path diagram

6 Results presentation

References

Wan Nor Arifin (wnarifin@usm.my),
Biostatistics and Research Methodology Unit, Universiti Sains Malaysia
Website: wnarifin.github.io