To test out the regsem package, we will first simulate data
## This is lavaan 0.6-19
## lavaan is FREE software! Please report any bugs.
## Loading required package: Rcpp
## Loading required package: Rsolnp
And then run the model with lavaan so we can better see the structure.
run.mod <- "
f1 =~ NA*y1 + y2 + y3+ y4 + y5
f1 ~ c1*x1 + c2*x2 + c3*x3 + c4*x4 + c5*x5 + c6*x6 + c7*x7 + c8*x8
f1~~1*f1
"
lav.out <- sem(run.mod,dat.sim,fixed.x=FALSE)
#summary(lav.out)
parameterestimates(lav.out)[6:13,] # just look at regressions
## lhs op rhs label est se z pvalue ci.lower ci.upper
## 6 f1 ~ x1 c1 0.068 0.118 0.582 0.561 -0.162 0.299
## 7 f1 ~ x2 c2 0.099 0.112 0.883 0.377 -0.120 0.318
## 8 f1 ~ x3 c3 -0.077 0.132 -0.583 0.560 -0.334 0.181
## 9 f1 ~ x4 c4 0.049 0.120 0.407 0.684 -0.186 0.284
## 10 f1 ~ x5 c5 -0.017 0.133 -0.129 0.897 -0.278 0.243
## 11 f1 ~ x6 c6 0.165 0.107 1.544 0.123 -0.044 0.373
## 12 f1 ~ x7 c7 0.443 0.129 3.438 0.001 0.191 0.696
## 13 f1 ~ x8 c8 0.668 0.144 4.631 0.000 0.385 0.951
One of the difficult pieces in using the cv_regsem function is that the penalty has to be calibrated for each particular problem. In running this code, I first tested the default, but this was too small given that there were some parameters > 0.4. After plotting this initial run, I saw that some parameters weren’t penalized enough, therefore, I increased the penalty jump to 0.05 and with 30 different values this tested a model (at a high penalty) that had all estimates as zero. In some cases it isn’t necessary to test a sequence of penalties that would set “large” parameters to zero, as either the model could fail to converge then, or there is not uncertainty about those parameters inclusion.
reg.out <- cv_regsem(lav.out,n.lambda=30,type="lasso",jump=0.04,
pars_pen=c("c1","c2","c3","c4","c5","c6","c7","c8"))
In specifying this model, we use the parameter labels to tell cv_regsem which of the parameters to penalize. Equivalently, we could have used the extractMatrices function to identify which parameters to penalize.
Additionally, we can specify which parameters are penalized by type: “regressions”, “loadings”, or both c(“regressions”,“loadings”). Note though that this penalizes all parameters of this type. If you desire to penalize a subset of parameters, use either the parameter name or number format for pars_pen.
Next, we can get a summary of the models tested.
## CV regsem Object
## Number of parameters regularized: 8
## Lambda ranging from 0 to 0.88
## Lowest Fit Lambda: 0.2
## Metric: BIC
## Number Converged: 23
Here we can see that we used a large enough penalty to set all parameter estimates to zero. However, the best fitting model came early on, when only a couple parameters were zero.
regsem defaults to using the BIC to choose a final model. This shows up in the final_pars object as well as the lines in the plot. This can be changed with the metric argument.
Understand better what went on with the fit
## lambda conv rmsea BIC chisq
## [1,] 0.00 0 0.07885 3969.101 59.77295
## [2,] 0.04 0 0.07688 3964.960 60.23719
## [3,] 0.08 0 0.07599 3961.415 61.29751
## [4,] 0.12 0 0.07570 3958.203 62.69114
## [5,] 0.16 0 0.07570 3955.167 64.26013
## [6,] 0.20 0 0.07634 3952.532 66.23046
## [7,] 0.24 0 0.07868 3954.042 67.74022
## [8,] 0.28 0 0.08178 3956.110 69.80762
## [9,] 0.32 0 0.08543 3958.650 72.34768
## [10,] 0.36 0 0.08915 3961.348 75.04648
One thing to check is the “conv” column. This refers to convergence, with 0 meaning the model converged.
And see what the best fitting parameter estimates are.
## f1 -> y1 f1 -> y2 f1 -> y3 f1 -> y4 f1 -> y5 x1 -> f1 x2 -> f1 x3 -> f1
## 1.035 0.948 1.021 1.137 0.948 0.000 0.001 0.000
## x4 -> f1 x5 -> f1 x6 -> f1 x7 -> f1 x8 -> f1
## 0.000 0.000 0.057 0.255 0.493
In this final model, we set the regression paths for x2,x3, x4, and x5 to zero. We make a mistake for x1, but we also correctly identify x6, x7, and x8 as true paths .Maximum likelihood estimation with lavaan had p-values > 0.05 for the parameters simulated as zero, but also did not identify the true path of 0.2 as significant (< 0.05).