In this hands-on, we are going to further validate our model based on the EFA findings. The same data set, “Attitude_Statistics v3.sav” will be used.
The evidence of internal structure will be provided by
Confirmatory factor analysis
Composite reliability
We are going to use psych
(Revelle, 2023),
lavaan
version 0.6.15 (Rosseel, Jorgensen, &
Rockwood, 2023), semTools
version 0.5.6 (Jorgensen, Pornprasertmanit, Schoemann, &
Rosseel, 2022) and semPlot
version 1.1.6 (Epskamp,
2022) in our analysis. These packages must be installed from
downloaded packages from CRAN at https://cran.r-project.org/.
Make sure you already installed all of them before loading the packages.
library(foreign)
library(psych)
library(lavaan) # for CFA
library(semTools) # for additional functions in SEM
library(semPlot) # for path diagram
We include only good items from PA1 and
PA2 in data.cfa
data frame.
data = read.spss("Attitude_Statistics v3.sav", F, T) # Shortform
# Include selected items from PA1 & PA2 in "data.cfa"
data.cfa = data[c("Q4","Q5","Q6","Q7","Q8","Q9","Q10","Q11")]
str(data.cfa)
## 'data.frame': 150 obs. of 8 variables:
## $ Q4 : num 3 3 1 4 2 3 3 2 4 4 ...
## ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
## .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
## $ Q5 : num 4 4 1 3 5 4 4 3 3 4 ...
## ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
## .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
## $ Q6 : num 4 4 1 2 1 4 3 3 4 5 ...
## ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
## .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
## $ Q7 : num 3 4 1 2 4 4 4 2 3 4 ...
## ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
## .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
## $ Q8 : num 3 3 4 2 5 3 3 3 3 4 ...
## ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
## .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
## $ Q9 : num 3 3 4 2 5 4 3 4 5 4 ...
## ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
## .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
## $ Q10: num 3 3 5 2 3 4 3 4 4 4 ...
## ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
## .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
## $ Q11: num 4 4 1 3 4 4 3 3 4 4 ...
## ..- attr(*, "value.labels")= Named num [1:5] 5 4 3 2 1
## .. ..- attr(*, "names")= chr [1:5] "STRONGLY AGREE" "AGREE" "NEUTRAL" "DISAGREE" ...
dim(data.cfa)
## [1] 150 8
names(data.cfa)
## [1] "Q4" "Q5" "Q6" "Q7" "Q8" "Q9" "Q10" "Q11"
head(data.cfa)
## Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11
## 1 3 4 4 3 3 3 3 4
## 2 3 4 4 4 3 3 3 4
## 3 1 1 1 1 4 4 5 1
## 4 4 3 2 2 2 2 2 3
## 5 2 5 1 4 5 5 3 4
## 6 3 4 4 4 3 4 4 4
Descriptive statistics
Check minimum/maximum values per item, and screen for any missing values,
describe(data.cfa)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## Q4 1 150 2.81 1.17 3 2.78 1.48 1 5 4 0.19 -0.81 0.10
## Q5 2 150 3.31 1.01 3 3.32 1.48 1 5 4 -0.22 -0.48 0.08
## Q6 3 150 3.05 1.09 3 3.05 1.48 1 5 4 -0.04 -0.71 0.09
## Q7 4 150 2.92 1.19 3 2.92 1.48 1 5 4 -0.04 -1.06 0.10
## Q8 5 150 3.33 1.00 3 3.34 1.48 1 5 4 -0.08 -0.12 0.08
## Q9 6 150 3.44 1.05 3 3.48 1.48 1 5 4 -0.21 -0.32 0.09
## Q10 7 150 3.31 1.10 3 3.36 1.48 1 5 4 -0.22 -0.39 0.09
## Q11 8 150 3.35 0.94 3 3.37 1.48 1 5 4 -0.31 -0.33 0.08
Note that all n = 150, no missing values.
min
–max
cover the whole range of response
options.
% of response to options per item,
response.frequencies(data.cfa)
## 1 2 3 4 5 miss
## Q4 0.140 0.280 0.30 0.19 0.093 0
## Q5 0.040 0.167 0.35 0.33 0.113 0
## Q6 0.080 0.233 0.33 0.26 0.093 0
## Q7 0.133 0.267 0.23 0.29 0.080 0
## Q8 0.047 0.100 0.48 0.23 0.147 0
## Q9 0.047 0.093 0.42 0.25 0.187 0
## Q10 0.073 0.107 0.42 0.23 0.167 0
## Q11 0.027 0.153 0.35 0.39 0.087 0
All response options are used, and there are no missing values.
Multivariate normality
This is done to check the multivariate normality of the data. If the
data are normally distributed, we may use maximum likelihood (ML)
estimation method for the CFA. In lavaan
, we have a number
of alternative estimation methods (the full list is available at http://lavaan.ugent.be/tutorial/est.html or by typing
?lavOptions
). Two common alternatives are:
mardia(data.cfa)
## Call: mardia(x = data.cfa)
##
## Mardia tests of multivariate skew and kurtosis
## Use describe(x) the to get univariate tests
## n.obs = 150 num.vars = 8
## b1p = 11.37 skew = 284.27 with probability <= 2.3e-15
## small sample skew = 291.24 with probability <= 2.9e-16
## b2p = 96.9 kurtosis = 8.18 with probability <= 2.2e-16
the data are not multivariate normal (kurtosis
> 5,
P < 0.05). We will use MLR in our
analysis.
Specify the measurement model
Specify the measurement model according to lavaan
syntax.
model = "
PA1 =~ Q4 + Q5 + Q6 + Q7 + Q11
PA2 =~ Q8 + Q9 + Q10
"
=~
indicates “measured by”, thus the items represent the
factor.
By default, lavaan
will correlate PA1 and PA2
(i.e. PA1 ~~ PA2
), somewhat similar to oblique rotation in
EFA. ~~
means “correlation”. We will use ~~
when we add correlated errors later.
Fit the model
Here, we fit the specified model. By default, marker indicator
variable approach1 is used in lavaan
to scale a
factor2.
We use MLR as the estimation method.
cfa.model = cfa(model, data = data.cfa, estimator = "MLR")
# cfa.model = cfa(model, data = data.cfa, std.lv = 1) # factor variance = 1
summary(cfa.model, fit.measures = T, standardized = T)
## lavaan 0.6.15 ended normally after 21 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 17
##
## Number of observations 150
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 37.063 27.373
## Degrees of freedom 19 19
## P-value (Chi-square) 0.008 0.096
## Scaling correction factor 1.354
## Yuan-Bentler correction (Mplus variant)
##
## Model Test Baseline Model:
##
## Test statistic 453.795 325.195
## Degrees of freedom 28 28
## P-value 0.000 0.000
## Scaling correction factor 1.395
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.958 0.972
## Tucker-Lewis Index (TLI) 0.937 0.958
##
## Robust Comparative Fit Index (CFI) 0.973
## Robust Tucker-Lewis Index (TLI) 0.960
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1566.019 -1566.019
## Scaling correction factor 1.140
## for the MLR correction
## Loglikelihood unrestricted model (H1) -1547.487 -1547.487
## Scaling correction factor 1.253
## for the MLR correction
##
## Akaike (AIC) 3166.037 3166.037
## Bayesian (BIC) 3217.218 3217.218
## Sample-size adjusted Bayesian (SABIC) 3163.416 3163.416
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.080 0.054
## 90 Percent confidence interval - lower 0.040 0.000
## 90 Percent confidence interval - upper 0.118 0.091
## P-value H_0: RMSEA <= 0.050 0.098 0.397
## P-value H_0: RMSEA >= 0.080 0.527 0.133
##
## Robust RMSEA 0.063
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.112
## P-value H_0: Robust RMSEA <= 0.050 0.312
## P-value H_0: Robust RMSEA >= 0.080 0.320
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.079 0.079
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## PA1 =~
## Q4 1.000 0.952 0.814
## Q5 0.660 0.092 7.218 0.000 0.629 0.624
## Q6 0.810 0.090 9.010 0.000 0.771 0.708
## Q7 0.916 0.086 10.641 0.000 0.872 0.735
## Q11 0.533 0.093 5.719 0.000 0.507 0.544
## PA2 =~
## Q8 1.000 0.653 0.655
## Q9 1.347 0.156 8.654 0.000 0.880 0.844
## Q10 1.436 0.199 7.206 0.000 0.938 0.856
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## PA1 ~~
## PA2 0.077 0.075 1.035 0.301 0.124 0.124
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Q4 0.460 0.089 5.182 0.000 0.460 0.337
## .Q5 0.620 0.086 7.178 0.000 0.620 0.611
## .Q6 0.590 0.101 5.836 0.000 0.590 0.498
## .Q7 0.647 0.125 5.181 0.000 0.647 0.460
## .Q11 0.611 0.077 7.934 0.000 0.611 0.704
## .Q8 0.567 0.101 5.628 0.000 0.567 0.570
## .Q9 0.312 0.094 3.325 0.001 0.312 0.287
## .Q10 0.321 0.106 3.046 0.002 0.321 0.267
## PA1 0.906 0.137 6.587 0.000 1.000 1.000
## PA2 0.427 0.106 4.009 0.000 1.000 1.000
Results
Read the results marked as Scaled
. These represent the
results of MLR.
To interpret the results, we must looks at
Std.all
column under
Latent Variables
table)Std.all
column under
Covariances
table)1. Fit indices
The following are a number of selected fit indices and the recommended cut-off values (Brown, 2015; Schreiber, Nora, Stage, Barlow, & King, 2006),
Category | Fit index | Cut-off |
---|---|---|
Absolute fit | \(\chi^2\) | \(P>0.05\) |
Standardized root mean square (SRMR) | \(\leq 0.08\) | |
Parsimony correction | Root mean square error of | and its 90% CI \(\leq 0.08\), |
approximation (RMSEA) | CFit \(P>0.05\) (\(H_0\leq 0.05\)) | |
Comparative fit | Comparative fit index (CFI) | \(\geq 0.95\) |
Tucker-Lewis index (TLI) |
2. Localized areas of misfit (Brown, 2015)
Residuals are the difference between the values in the sample and model-implied variance-covariance matrices.
Standardized residuals (SRs) > |2.58| indicate the standardized discrepancy between the matrices.
A modification index indicates the expected parameter change if we include a particular specification in the model (i.e. a constrained/fixed parameter is freely estimated, e.g. by correlating between errors of Q1 and Q2).
Specifications with MIs > 3.84 should be investigated.
3. Parameter estimates
Std.all
column under
Latent Variables
table).The guideline for EFA is applicable also to CFA. For example, FLs \(\geq\) 0.5 are practically significant. In addition, the P-values of the FLs must be significant (at \(\alpha\) = 0.05).
Also look for out-of-range values. FLs should be in range of 0 to 1 (absolute values), thus values > 1 are called Heywood cases or offending estimates (Brown, 2015)
Std.all
column under
Covariances
table).Similar to EFA, a factor correlation must be < 0.85, which indicates that the factors are distinct. A correlation > 0.85 indicates multicollinearity problem. Also look for out-of-range values. Factor correlations should be in range of 0 to 1 (absolute values).
In addition, when a model has Heywood cases, the solution is not acceptable. The variance-covariance matrix (of our data) could be non-positive definite i.e. the matrix is not invertible for the analysis.
In our output:
Fit indices,
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 17
##
## Number of observations 150
## Model Test User Model:
## Standard Scaled
## Test Statistic 37.063 27.373
## Degrees of freedom 19 19
## P-value (Chi-square) 0.008 0.096
## Scaling correction factor 1.354
## Yuan-Bentler correction (Mplus variant)
## Model Test Baseline Model:
## Test statistic 453.795 325.195
## Degrees of freedom 28 28
## P-value 0.000 0.000
## Scaling correction factor 1.395
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.958 0.972
## Tucker-Lewis Index (TLI) 0.937 0.958
##
## Robust Comparative Fit Index (CFI) 0.973
## Robust Tucker-Lewis Index (TLI) 0.960
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1566.019 -1566.019
## Scaling correction factor 1.140
## for the MLR correction
## Loglikelihood unrestricted model (H1) -1547.487 -1547.487
## Scaling correction factor 1.253
## for the MLR correction
##
## Akaike (AIC) 3166.037 3166.037
## Bayesian (BIC) 3217.218 3217.218
## Sample-size adjusted Bayesian (SABIC) 3163.416 3163.416
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.080 0.054
## 90 Percent confidence interval - lower 0.040 0.000
## 90 Percent confidence interval - upper 0.118 0.091
## P-value H_0: RMSEA <= 0.050 0.098 0.397
## P-value H_0: RMSEA >= 0.080 0.527 0.133
##
## Robust RMSEA 0.063
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.112
## P-value H_0: Robust RMSEA <= 0.050 0.312
## P-value H_0: Robust RMSEA >= 0.080 0.320
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.079 0.079
The model has good model fit based on all indices, with the exception of the upper 90% CI of robust RMSEA = 0.112. CFit P-value for robust RMSEA = 0.312, which means we cannot reject \(H_0 \leq 0.050\).
FLs and factor correlation,
## Parameter Estimates:
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## PA1 =~
## Q4 1.000 0.952 0.814
## Q5 0.660 0.092 7.218 0.000 0.629 0.624
## Q6 0.810 0.090 9.010 0.000 0.771 0.708
## Q7 0.916 0.086 10.641 0.000 0.872 0.735
## Q11 0.533 0.093 5.719 0.000 0.507 0.544
## PA2 =~
## Q8 1.000 0.653 0.655
## Q9 1.347 0.156 8.654 0.000 0.880 0.844
## Q10 1.436 0.199 7.206 0.000 0.938 0.856
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## PA1 ~~
## PA2 0.077 0.075 1.035 0.301 0.124 0.124
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Q4 0.460 0.089 5.182 0.000 0.460 0.337
## .Q5 0.620 0.086 7.178 0.000 0.620 0.611
## .Q6 0.590 0.101 5.836 0.000 0.590 0.498
## .Q7 0.647 0.125 5.181 0.000 0.647 0.460
## .Q11 0.611 0.077 7.934 0.000 0.611 0.704
## .Q8 0.56