3 Exploratory factor analysis

3.1 Preliminary steps

Descriptive statistics:

Check minimum-maximum values per item,

describe(data1)

##     vars   n mean   sd median trimmed  mad min max range  skew kurtosis
## Q1     1 150 3.13 1.10      3    3.12 1.48   1   5     4 -0.10    -0.73
## Q2     2 150 3.51 1.03      3    3.55 1.48   1   5     4 -0.14    -0.47
## Q3     3 150 3.18 1.03      3    3.17 1.48   1   5     4 -0.03    -0.42
## Q4     4 150 2.81 1.17      3    2.77 1.48   1   5     4  0.19    -0.81
## Q5     5 150 3.31 1.01      3    3.32 1.48   1   5     4 -0.22    -0.48
## Q6     6 150 3.05 1.09      3    3.05 1.48   1   5     4 -0.04    -0.71
## Q7     7 150 2.92 1.19      3    2.92 1.48   1   5     4 -0.04    -1.06
## Q8     8 150 3.33 1.00      3    3.34 1.48   1   5     4 -0.08    -0.12
## Q9     9 150 3.44 1.05      3    3.48 1.48   1   5     4 -0.21    -0.32
## Q10   10 150 3.31 1.10      3    3.36 1.48   1   5     4 -0.22    -0.39
## Q11   11 150 3.35 0.94      3    3.37 1.48   1   5     4 -0.31    -0.33
## Q12   12 150 2.83 0.98      3    2.83 1.48   1   5     4  0.09    -0.68
##       se
## Q1  0.09
## Q2  0.08
## Q3  0.08
## Q4  0.10
## Q5  0.08
## Q6  0.09
## Q7  0.10
## Q8  0.08
## Q9  0.09
## Q10 0.09
## Q11 0.08
## Q12 0.08

n (%) of response to options per item,

response.frequencies(data1)

##              1          2         3         4          5 miss
## Q1  0.07333333 0.22000000 0.3200000 0.2800000 0.10666667    0
## Q2  0.03333333 0.09333333 0.4200000 0.2400000 0.21333333    0
## Q3  0.05333333 0.18000000 0.4133333 0.2400000 0.11333333    0
## Q4  0.14000000 0.28000000 0.3000000 0.1866667 0.09333333    0
## Q5  0.04000000 0.16666667 0.3466667 0.3333333 0.11333333    0
## Q6  0.08000000 0.23333333 0.3333333 0.2600000 0.09333333    0
## Q7  0.13333333 0.26666667 0.2266667 0.2933333 0.08000000    0
## Q8  0.04666667 0.10000000 0.4800000 0.2266667 0.14666667    0
## Q9  0.04666667 0.09333333 0.4200000 0.2533333 0.18666667    0
## Q10 0.07333333 0.10666667 0.4200000 0.2333333 0.16666667    0
## Q11 0.02666667 0.15333333 0.3466667 0.3866667 0.08666667    0
## Q12 0.07333333 0.32666667 0.3333333 0.2333333 0.03333333    0

Normality of data

Univariate normality

Histograms

par(mfrow = c(3,4))  # set view to 3 rows & 4 columns
apply(data1, 2, hist)

par(mfrow = c(1,1))  # set to default full view
# multi.hist(data1)  # at times, error

Shapiro Wilk’s test

apply(data1, 2, shapiro.test)

## $Q1
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.91535, p-value = 1.075e-07
## 
## 
## $Q2
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.88321, p-value = 1.656e-09
## 
## 
## $Q3
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.90785, p-value = 3.76e-08
## 
## 
## $Q4
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.91347, p-value = 8.225e-08
## 
## 
## $Q5
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.90615, p-value = 2.986e-08
## 
## 
## $Q6
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.91619, p-value = 1.214e-07
## 
## 
## $Q7
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.90559, p-value = 2.768e-08
## 
## 
## $Q8
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.88115, p-value = 1.301e-09
## 
## 
## $Q9
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.88932, p-value = 3.445e-09
## 
## 
## $Q10
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.89574, p-value = 7.653e-09
## 
## 
## $Q11
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.89194, p-value = 4.758e-09
## 
## 
## $Q12
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.90097, p-value = 1.501e-08

Multivariate normality

mardiaTest(data1, qqplot = TRUE)

##    Mardia's Multivariate Normality Test 
## --------------------------------------- 
##    data : data1 
## 
##    g1p            : 29.99652 
##    chi.skew       : 749.9129 
##    p.value.skew   : 5.923668e-29 
## 
##    g2p            : 203.0284 
##    z.kurtosis     : 11.70215 
##    p.value.kurt   : 0 
## 
##    chi.small.skew : 767.2563 
##    p.value.small  : 6.235697e-31 
## 
##    Result          : Data are not multivariate normal. 
## ---------------------------------------

3.2 Step 1

Check suitability of data for analysis

Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy

KMO(data1)  # middling

## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = data1)
## Overall MSA =  0.76
## MSA for each item = 
##   Q1   Q2   Q3   Q4   Q5   Q6   Q7   Q8   Q9  Q10  Q11  Q12 
## 0.34 0.83 0.64 0.75 0.83 0.81 0.82 0.81 0.68 0.70 0.82 0.68

Bartlet’s test of sphericity

cortest.bartlett(data1)  # < 0.05 = sphericity assumption met

## R was not square, finding R from data

## $chisq
## [1] 562.3065
## 
## $p.value
## [1] 7.851736e-80
## 
## $df
## [1] 66

Determine the number of factors by

Eigenvalues
Scree plot

scree = scree(data1)  # we are only concerned with FA scree & eigenvalues

print(scree)

## Scree of eigen values 
## Call: NULL
## Eigen values of factors  [1]  2.67  1.78  0.18  0.07  0.02 -0.05 -0.09 -0.15 -0.20 -0.41 -0.46
## [12] -0.70
## Eigen values of Principal Components [1] 3.29 2.66 1.04 0.96 0.79 0.73 0.65 0.52 0.46 0.36 0.33 0.22

Parallel analysis

parallel = fa.parallel(data1, fm = "pa", fa = "fa")

## Parallel analysis suggests that the number of factors =  2  and the number of components =  NA

print(parallel)

## Call: fa.parallel(x = data1, fm = "pa", fa = "fa")
## Parallel analysis suggests that the number of factors =  2  and the number of components =  NA 
## 
##  Eigen Values of 
## 
##  eigen values of factors
##  [1]  2.67  1.78  0.18  0.07  0.02 -0.05 -0.09 -0.15 -0.20 -0.41 -0.46
## [12] -0.70
## 
##  eigen values of simulated factors
##  [1]  0.71  0.39  0.27  0.21  0.13  0.06 -0.01 -0.07 -0.14 -0.21 -0.28
## [12] -0.36
## 
##  eigen values of components 
##  [1] 3.29 2.66 1.04 0.96 0.79 0.73 0.65 0.52 0.46 0.36 0.33 0.22
## 
##  eigen values of simulated components
## [1] NA

3.3 Step 2

Run EFA by fixing number of factors as decided from previous step.
Decide on rotation method. Choose an oblique rotation, Promax.

fa = fa(data1, nfactors = 2, rotate = "promax", fm = "pa")

## Loading required namespace: GPArotation

print(fa)

## Factor Analysis using method =  pa
## Call: fa(r = data1, nfactors = 2, rotate = "promax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##       PA1   PA2     h2   u2 com
## Q1  -0.03  0.05 0.0037 1.00 1.6
## Q2   0.22  0.41 0.2071 0.79 1.5
## Q3  -0.28  0.44 0.2819 0.72 1.7
## Q4   0.81 -0.02 0.6585 0.34 1.0
## Q5   0.62  0.23 0.4169 0.58 1.3
## Q6   0.72  0.00 0.5251 0.47 1.0
## Q7   0.73 -0.04 0.5327 0.47 1.0
## Q8   0.31  0.66 0.5012 0.50 1.4
## Q9   0.12  0.77 0.5983 0.40 1.0
## Q10  0.06  0.88 0.7749 0.23 1.0
## Q11  0.54  0.10 0.2977 0.70 1.1
## Q12 -0.29  0.29 0.1767 0.82 2.0
## 
##                        PA1  PA2
## SS loadings           2.68 2.30
## Proportion Var        0.22 0.19
## Cumulative Var        0.22 0.41
## Proportion Explained  0.54 0.46
## Cumulative Proportion 0.54 1.00
## 
##  With factor correlations of 
##       PA1   PA2
## PA1  1.00 -0.06
## PA2 -0.06  1.00
## 
## Mean item complexity =  1.3
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  66  and the objective function was  3.9 with Chi Square of  562.31
## The degrees of freedom for the model are 43  and the objective function was  0.44 
## 
## The root mean square of the residuals (RMSR) is  0.05 
## The df corrected root mean square of the residuals is  0.06 
## 
## The harmonic number of observations is  150 with the empirical chi square  41.53  with prob <  0.54 
## The total number of observations was  150  with Likelihood Chi Square =  62.26  with prob <  0.029 
## 
## Tucker Lewis Index of factoring reliability =  0.94
## RMSEA index =  0.059  and the 90 % confidence intervals are  0.018 0.083
## BIC =  -153.19
## Fit based upon off diagonal values = 0.97
## Measures of factor score adequacy             
##                                                 PA1  PA2
## Correlation of scores with factors             0.92 0.93
## Multiple R square of scores with factors       0.85 0.86
## Minimum correlation of possible factor scores  0.70 0.73

print(fa, cut = .3, digits = 3)

## Factor Analysis using method =  pa
## Call: fa(r = data1, nfactors = 2, rotate = "promax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##        PA1    PA2      h2    u2  com
## Q1                0.00366 0.996 1.56
## Q2          0.413 0.20708 0.793 1.51
## Q3          0.437 0.28192 0.718 1.70
## Q4   0.810        0.65855 0.341 1.00
## Q5   0.616        0.41688 0.583 1.28
## Q6   0.725        0.52512 0.475 1.00
## Q7   0.726        0.53270 0.467 1.01
## Q8   0.306  0.655 0.50124 0.499 1.42
## Q9          0.771 0.59830 0.402 1.04
## Q10         0.882 0.77491 0.225 1.01
## Q11  0.542        0.29771 0.702 1.07
## Q12               0.17665 0.823 2.00
## 
##                         PA1   PA2
## SS loadings           2.678 2.297
## Proportion Var        0.223 0.191
## Cumulative Var        0.223 0.415
## Proportion Explained  0.538 0.462
## Cumulative Proportion 0.538 1.000
## 
##  With factor correlations of 
##        PA1    PA2
## PA1  1.000 -0.055
## PA2 -0.055  1.000
## 
## Mean item complexity =  1.3
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  66  and the objective function was  3.9 with Chi Square of  562.306
## The degrees of freedom for the model are 43  and the objective function was  0.436 
## 
## The root mean square of the residuals (RMSR) is  0.046 
## The df corrected root mean square of the residuals is  0.057 
## 
## The harmonic number of observations is  150 with the empirical chi square  41.529  with prob <  0.535 
## The total number of observations was  150  with Likelihood Chi Square =  62.265  with prob <  0.0288 
## 
## Tucker Lewis Index of factoring reliability =  0.9398
## RMSEA index =  0.0585  and the 90 % confidence intervals are  0.0184 0.0832
## BIC =  -153.192
## Fit based upon off diagonal values = 0.973
## Measures of factor score adequacy             
##                                                  PA1   PA2
## Correlation of scores with factors             0.921 0.929
## Multiple R square of scores with factors       0.849 0.863
## Minimum correlation of possible factor scores  0.698 0.725

# h2 = communalities
# u2 = error variance

Assess the results:

Judge the quality of items. Remove poor performing items.

Communality? Q1 < Q12 < Q2 < .25 Pattern coeff/factor loading FL? Q1 < .3, Q12 < .4, Q2 & Q3 < .5

Check for overlap between factors.

PA1~PA2 = .107 < .85 OK

3.4 Step 3

Re-run the analysis similar to Step 2 every time an item is removed. Make judgment based on the results.
The analysis is finished once we have:

satisfactory number of factors.
satisfactory quality of items.

Decisions?
Remove Q1? Low com & FL

fa1 = fa(data1[-1], nfactors = 2, rotate = "promax", fm = "pa")
print(fa1, cut = .3, digits = 3)

## Factor Analysis using method =  pa
## Call: fa(r = data1[-1], nfactors = 2, rotate = "promax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##        PA1    PA2    h2    u2  com
## Q2          0.417 0.207 0.793 1.33
## Q3  -0.324  0.426 0.280 0.720 1.87
## Q4   0.811        0.658 0.342 1.00
## Q5   0.590        0.417 0.583 1.35
## Q6   0.724        0.526 0.474 1.00
## Q7   0.731        0.534 0.466 1.00
## Q8          0.660 0.499 0.501 1.25
## Q9          0.774 0.601 0.399 1.00
## Q10         0.883 0.779 0.221 1.00
## Q11  0.531        0.298 0.702 1.09
## Q12 -0.316        0.175 0.825 1.97
## 
##                         PA1   PA2
## SS loadings           2.643 2.329
## Proportion Var        0.240 0.212
## Cumulative Var        0.240 0.452
## Proportion Explained  0.532 0.468
## Cumulative Proportion 0.532 1.000
## 
##  With factor correlations of 
##       PA1   PA2
## PA1 1.000 0.022
## PA2 0.022 1.000
## 
## Mean item complexity =  1.3
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  55  and the objective function was  3.86 with Chi Square of  557.828
## The degrees of freedom for the model are 34  and the objective function was  0.396 
## 
## The root mean square of the residuals (RMSR) is  0.047 
## The df corrected root mean square of the residuals is  0.06 
## 
## The harmonic number of observations is  150 with the empirical chi square  36.112  with prob <  0.37 
## The total number of observations was  150  with Likelihood Chi Square =  56.625  with prob <  0.00876 
## 
## Tucker Lewis Index of factoring reliability =  0.9265
## RMSEA index =  0.0702  and the 90 % confidence intervals are  0.0337 0.0967
## BIC =  -113.736
## Fit based upon off diagonal values = 0.976
## Measures of factor score adequacy             
##                                                  PA1   PA2
## Correlation of scores with factors             0.921 0.931
## Multiple R square of scores with factors       0.848 0.866
## Minimum correlation of possible factor scores  0.697 0.733

Remove Q12? Low com & FL

fa2 = fa(data1[-c(1,12)], nfactors = 2, rotate = "promax", fm = "pa")
print(fa2, cut = .3, digits = 3)

## Factor Analysis using method =  pa
## Call: fa(r = data1[-c(1, 12)], nfactors = 2, rotate = "promax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##        PA1    PA2    h2    u2  com
## Q2          0.417 0.211 0.789 1.23
## Q3  -0.319  0.417 0.239 0.761 1.87
## Q4   0.844        0.702 0.298 1.01
## Q5   0.594        0.426 0.574 1.23
## Q6   0.709        0.501 0.499 1.00
## Q7   0.735        0.531 0.469 1.02
## Q8          0.658 0.507 0.493 1.18
## Q9          0.781 0.608 0.392 1.00
## Q10         0.895 0.789 0.211 1.02
## Q11  0.529        0.298 0.702 1.05
## 
##                         PA1   PA2
## SS loadings           2.558 2.251
## Proportion Var        0.256 0.225
## Cumulative Var        0.256 0.481
## Proportion Explained  0.532 0.468
## Cumulative Proportion 0.532 1.000
## 
##  With factor correlations of 
##       PA1   PA2
## PA1 1.000 0.135
## PA2 0.135 1.000
## 
## Mean item complexity =  1.2
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  45  and the objective function was  3.603 with Chi Square of  521.821
## The degrees of freedom for the model are 26  and the objective function was  0.275 
## 
## The root mean square of the residuals (RMSR) is  0.039 
## The df corrected root mean square of the residuals is  0.052 
## 
## The harmonic number of observations is  150 with the empirical chi square  20.882  with prob <  0.748 
## The total number of observations was  150  with Likelihood Chi Square =  39.525  with prob <  0.0434 
## 
## Tucker Lewis Index of factoring reliability =  0.9504
## RMSEA index =  0.0623  and the 90 % confidence intervals are  0.0105 0.0944
## BIC =  -90.752
## Fit based upon off diagonal values = 0.985
## Measures of factor score adequacy             
##                                                  PA1   PA2
## Correlation of scores with factors             0.925 0.934
## Multiple R square of scores with factors       0.855 0.872
## Minimum correlation of possible factor scores  0.711 0.744

Remove Q2? Low com & FL

fa3 = fa(data1[-c(1,2,12)], nfactors = 2, rotate = "promax", fm = "pa")
print(fa3, cut = .3, digits = 3)

## Factor Analysis using method =  pa
## Call: fa(r = data1[-c(1, 2, 12)], nfactors = 2, rotate = "promax", 
##     fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##        PA1    PA2    h2    u2  com
## Q3          0.402 0.229 0.771 1.80
## Q4   0.841        0.705 0.295 1.01
## Q5   0.617        0.438 0.562 1.22
## Q6   0.706        0.496 0.504 1.00
## Q7   0.727        0.528 0.472 1.02
## Q8          0.621 0.465 0.535 1.31
## Q9          0.791 0.635 0.365 1.01
## Q10         0.905 0.819 0.181 1.00
## Q11  0.534        0.295 0.705 1.04
## 
##                         PA1   PA2
## SS loadings           2.552 2.059
## Proportion Var        0.284 0.229
## Cumulative Var        0.284 0.512
## Proportion Explained  0.553 0.447
## Cumulative Proportion 0.553 1.000
## 
##  With factor correlations of 
##       PA1   PA2
## PA1 1.000 0.062
## PA2 0.062 1.000
## 
## Mean item complexity =  1.2
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  36  and the objective function was  3.362 with Chi Square of  488.011
## The degrees of freedom for the model are 19  and the objective function was  0.213 
## 
## The root mean square of the residuals (RMSR) is  0.037 
## The df corrected root mean square of the residuals is  0.051 
## 
## The harmonic number of observations is  150 with the empirical chi square  14.815  with prob <  0.734 
## The total number of observations was  150  with Likelihood Chi Square =  30.579  with prob <  0.0449 
## 
## Tucker Lewis Index of factoring reliability =  0.951
## RMSEA index =  0.0669  and the 90 % confidence intervals are  0.0099 0.1042
## BIC =  -64.623
## Fit based upon off diagonal values = 0.988
## Measures of factor score adequacy             
##                                                  PA1   PA2
## Correlation of scores with factors             0.925 0.939
## Multiple R square of scores with factors       0.855 0.882
## Minimum correlation of possible factor scores  0.711 0.764

Remove Q3? Low com & FL

fa4 = fa(data1[-c(1,2,3,12)], nfactors = 2, rotate = "promax", fm = "pa")
print(fa4, cut = .3, digits = 3)

## Factor Analysis using method =  pa
## Call: fa(r = data1[-c(1, 2, 3, 12)], nfactors = 2, rotate = "promax", 
##     fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##        PA1    PA2    h2    u2  com
## Q4   0.830        0.664 0.336 1.02
## Q5   0.607        0.447 0.553 1.16
## Q6   0.712        0.494 0.506 1.01
## Q7   0.760        0.549 0.451 1.05
## Q8          0.633 0.471 0.529 1.12
## Q9          0.862 0.717 0.283 1.02
## Q10         0.876 0.733 0.267 1.04
## Q11  0.533        0.299 0.701 1.02
## 
##                         PA1   PA2
## SS loadings           2.440 1.935
## Proportion Var        0.305 0.242
## Cumulative Var        0.305 0.547
## Proportion Explained  0.558 0.442
## Cumulative Proportion 0.558 1.000
## 
##  With factor correlations of 
##       PA1   PA2
## PA1 1.000 0.235
## PA2 0.235 1.000
## 
## Mean item complexity =  1.1
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  28  and the objective function was  3.025 with Chi Square of  440.182
## The degrees of freedom for the model are 13  and the objective function was  0.101 
## 
## The root mean square of the residuals (RMSR) is  0.031 
## The df corrected root mean square of the residuals is  0.045 
## 
## The harmonic number of observations is  150 with the empirical chi square  7.868  with prob <  0.852 
## The total number of observations was  150  with Likelihood Chi Square =  14.54  with prob <  0.337 
## 
## Tucker Lewis Index of factoring reliability =  0.9919
## RMSEA index =  0.0324  and the 90 % confidence intervals are  0 0.0885
## BIC =  -50.598
## Fit based upon off diagonal values = 0.993
## Measures of factor score adequacy             
##                                                  PA1   PA2
## Correlation of scores with factors             0.919 0.929
## Multiple R square of scores with factors       0.845 0.863
## Minimum correlation of possible factor scores  0.691 0.726

3.5 Summary:

PA1 =~ Q4, Q5, Q6, Q7, Q11
PA2 =~ Q8, Q9, Q10

Name the factor.

Exploratory factor analysis and Cronbach’s alpha

Wan Nor Arifin

2017-07-24

1 Introduction

2 Preliminaries

2.1 Load libraries

2.2 Load data set