This vignette explains how to perform scale linking with the PROsetta package. By way of illustration, we replicate the linking of the Center for Epidemiologic Studies Depression Scale (CES-D) to the PROMIS Depression metric as described in Choi, Schalet, Cook, and Cella (2014).
First step is to load the input datasets comprised of three tables
with loadData()
. The PROMIS Depression – CES-D linking data
are included in the PROsetta package directory under the folder
labeled data-raw
.
fp <- system.file("data-raw", package = "PROsetta")
d <- loadData(
response = "dat_DeCESD_v2.csv",
itemmap = "imap_DeCESD.csv",
anchor = "anchor_DeCESD.csv",
input_dir = fp)
response
: Contains item response data from both
instruments. You can supply a .csv filename or a data frame. In this
example, we supply a .csv filename dat_DeCESD_v2.csv
.itemmap
: Specifies which items belong to which
instruments. Can be a .csv filename or a data frame.anchor
: Contains tem parameters for anchor items (e.g.,
PROMIS Depression). Can be a .csv filename or a data frame.input_dir
: (Optional) The path of the directory to look
for the input .csv files.The response data contains individual item responses on both instruments (i.e., 28 PROMIS Depression items followed by 20 CES-D items). The data table should include the following columns.
prosettaid
: The person ID of the respondents (N = 747).
This column does not have to be named prosettaid
but should
not conflict with other data tables (item map and anchor).item_id
column in both the item map and anchor files.Run the following code, for example, to open the response data in edit mode.
file.edit(system.file("data-raw", "dat_DeCESD_v2.csv", package = "PROsetta"))
The item map data requires the following columns.
item_id
: Contains the unique ID of the items. The name
of this column does not have to be item_id
but should be
consistent with the item ID column in the anchor table. The IDs in this
column should match the column names in the response data.instrument
: Numerals (1 or 2) indicating to which of
the two instruments the items belong (e.g., 1 = PROMIS Depression; 2 =
CES-D)item_order
: The sequential position of the items in the
combined table (e.g., 1, 2, 3, …, 28, …, 48)item_name
: Secondary labels for the itemsncat
: The number of response categories by itemRun the following code to open the item map data in edit mode.
file.edit(system.file("data-raw", "imap_DeCESD.csv", package = "PROsetta"))
The anchor data contains the item parameters for the anchor scale (e.g., PROMIS Depression) and requires the following columns.
item_order
: The sequential position of the items in the
anchor scale (e.g., 1, 2, 3, …, 28)item_id
: The unique ID of the anchor items. The name of
this column does not have to be item_id
but should be
consistent with the item ID column in the item map table The IDs in this
column should refer to the specific column names in the response
data.a
: The slope parameter value for each anchor itemcb1
, cb2
, …: The category boundary
parameter values for each anchor itemRun the following code to open the anchor data in edit mode.
file.edit(system.file("data-raw", "anchor_DeCESD.csv", package = "PROsetta"))
The frequency distribution of each item in the response data is
obtained by runFrequency()
.
freq_table <- runFrequency(d)
head(freq_table)
## 1 2 3 4 5
## EDDEP04 526 112 66 29 14
## EDDEP05 488 118 91 37 12
## EDDEP06 502 119 85 30 10
## EDDEP07 420 155 107 49 16
## EDDEP09 492 132 89 25 9
## EDDEP14 445 150 101 37 14
The frequency distribution of the summed scores for the combined
scale can be plotted as a histogram with plot()
. The
required argument is a PROsetta_data
object created with
loadData()
. The optional scale
argument
specifies for which scale the summed score should be created. Setting
scale = 'combined'
plots the summed score distribution for
the combined scale.
plot(d, scale = "combined", title = "Combined scale")
The user can also generate the summed score distribution for the
first or second scale by specifying scale = 1
or
scale = 2
.
plot(d, scale = 1, title = "Scale 1") # not run
plot(d, scale = 2, title = "Scale 2") # not run
Basic descriptive statistics are obtained for each item by
runDescriptive()
.
desc_table <- runDescriptive(d)
head(desc_table)
## n mean sd median trimmed mad min max range skew kurtosis se
## EDDEP04 747 1.52 0.94 1 1.30 0 1 5 4 1.91 3.01 0.03
## EDDEP05 746 1.62 0.99 1 1.42 0 1 5 4 1.54 1.50 0.04
## EDDEP06 746 1.56 0.94 1 1.37 0 1 5 4 1.66 2.01 0.03
## EDDEP07 747 1.78 1.05 1 1.59 0 1 5 4 1.23 0.59 0.04
## EDDEP09 747 1.56 0.91 1 1.38 0 1 5 4 1.62 1.98 0.03
## EDDEP14 747 1.69 1.00 1 1.51 0 1 5 4 1.38 1.12 0.04
Classical reliability statistics can be obtained by
runClassical()
. By default, the analysis is performed for
the combined scale.
classical_table <- runClassical(d)
summary(classical_table$alpha$combined)
##
## Reliability analysis
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.98 0.98 0.99 0.53 54 9e-04 1.7 0.69 0.54
The user can set scalewise = TRUE
to request an analysis
for each scale separately in addition to the combined scale.
classical_table <- runClassical(d, scalewise = TRUE)
classical_table$alpha$combined # alpha values for combined scale
classical_table$alpha$`1` # alpha values for each scale, created when scalewise = TRUE
classical_table$alpha$`2` # alpha values for each scale, created when scalewise = TRUE
Specifying omega = TRUE
returns the McDonald’s \(\omega\) coefficients as well.
classical_table <- runClassical(d, scalewise = TRUE, omega = TRUE)
classical_table$omega$combined # omega values for combined scale
classical_table$omega$`1` # omega values for each scale, created when scalewise = TRUE
classical_table$omega$`2` # omega values for each scale, created when scalewise = TRUE
Additional arguments can be supplied to runClassical()
to pass onto psych::omega()
.
classical_table <- runClassical(d, scalewise = TRUE, omega = TRUE, nfactors = 5) # not run
Dimensionality analysis is performed with CFA by
runCFA()
. Setting scalewise = TRUE
performs
the dimensionality analysis for each scale separately in addition to the
combined scale.
out_cfa <- runCFA(d, scalewise = TRUE)
runCFA()
calls for lavaan::cfa()
internally
and can pass additional arguments onto it.
out_cfa <- runCFA(d, scalewise = TRUE, std.lv = TRUE) # not run
The CFA result for the combined scale is stored in the
combined
slot, and if scalewise = TRUE
, the
analysis for each scale is also stored in each numbered slot.
out_cfa$combined
## lavaan 0.6-12 ended normally after 23 iterations
##
## Estimator DWLS
## Optimization method NLMINB
## Number of model parameters 220
##
## Used Total
## Number of observations 731 747
##
## Model Test User Model:
## Standard Robust
## Test Statistic 4227.611 4700.781
## Degrees of freedom 1080 1080
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.046
## Shift parameter 657.455
## simple second-order correction
out_cfa$`1`
## lavaan 0.6-12 ended normally after 16 iterations
##
## Estimator DWLS
## Optimization method NLMINB
## Number of model parameters 140
##
## Used Total
## Number of observations 738 747
##
## Model Test User Model:
## Standard Robust
## Test Statistic 863.527 1434.277
## Degrees of freedom 350 350
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 0.678
## Shift parameter 160.257
## simple second-order correction
out_cfa$`2`
## lavaan 0.6-12 ended normally after 20 iterations
##
## Estimator DWLS
## Optimization method NLMINB
## Number of model parameters 80
##
## Used Total
## Number of observations 740 747
##
## Model Test User Model:
## Standard Robust
## Test Statistic 1106.148 1431.797
## Degrees of freedom 170 170
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 0.812
## Shift parameter 69.205
## simple second-order correction
CFA fit indices can be obtained by using summary()
from
the lavaan package. For the combined scale:
lavaan::summary(out_cfa$combined, fit.measures = TRUE, standardized = TRUE, estimates = FALSE)
## lavaan 0.6-12 ended normally after 23 iterations
##
## Estimator DWLS
## Optimization method NLMINB
## Number of model parameters 220
##
## Used Total
## Number of observations 731 747
##
## Model Test User Model:
## Standard Robust
## Test Statistic 4227.611 4700.781
## Degrees of freedom 1080 1080
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.046
## Shift parameter 657.455
## simple second-order correction
##
## Model Test Baseline Model:
##
## Test statistic 793198.138 91564.633
## Degrees of freedom 1128 1128
## P-value 0.000 0.000
## Scaling correction factor 8.758
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.996 0.960
## Tucker-Lewis Index (TLI) 0.996 0.958
##
## Robust Comparative Fit Index (CFI) NA
## Robust Tucker-Lewis Index (TLI) NA
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.063 0.068
## 90 Percent confidence interval - lower 0.061 0.066
## 90 Percent confidence interval - upper 0.065 0.070
## P-value RMSEA <= 0.05 0.000 0.000
##
## Robust RMSEA NA
## 90 Percent confidence interval - lower NA
## 90 Percent confidence interval - upper NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.051 0.051
and also for each scale separately:
lavaan::summary(out_cfa$`1`, fit.measures = TRUE, standardized = TRUE, estimates = FALSE) # not run
lavaan::summary(out_cfa$`2`, fit.measures = TRUE, standardized = TRUE, estimates = FALSE) # not run
runCalibration()
performs IRT calibration without
anchoring. runCalibration()
calls mirt::mirt()
internally. Additional arguments can be passed onto mirt
,
e.g., to increase the number of EM cycles to 1000, as follows:
out_calib <- runCalibration(d, technical = list(NCYCLES = 1000))
In case of nonconvergence, runCalibration()
explicitly
raises an error and does not return its results.
out_calib <- runCalibration(d, technical = list(NCYCLES = 10))
## Error in runCalibration(d, technical = list(NCYCLES = 10)): calibration did not converge: increase iteration limit by adjusting the 'technical' argument, e.g., technical = list(NCYCLES = 510)
Also, specify fixedpar = TRUE
to perform fixed parameter
calibration using the anchor data.
out_calib <- runCalibration(d, fixedpar = TRUE)
The output object from runCalibration()
can be used to
generate additional output with functions from the mirt
package.
Use coef()
to extract item parameters:
mirt::coef(out_calib, IRTpars = TRUE, simplify = TRUE)
## $items
## a b1 b2 b3 b4
## EDDEP04 4.261 0.401 0.976 1.696 2.444
## EDDEP05 3.932 0.305 0.913 1.593 2.412
## EDDEP06 4.145 0.350 0.915 1.678 2.471
## EDDEP07 2.802 0.148 0.772 1.603 2.538
## EDDEP09 3.657 0.312 0.982 1.782 2.571
## EDDEP14 2.333 0.186 0.947 1.729 2.633
## EDDEP17 3.274 -0.498 0.406 1.413 2.375
## EDDEP19 3.241 0.460 1.034 1.834 2.515
## EDDEP21 2.736 0.072 0.810 1.803 2.673
## EDDEP22 3.970 0.204 0.795 1.649 2.295
## EDDEP23 2.564 -0.038 0.693 1.653 2.584
## EDDEP26 3.093 -0.358 0.412 1.404 2.224
## EDDEP27 2.920 0.204 0.891 1.655 2.528
## EDDEP28 2.588 -0.079 0.633 1.477 2.328
## EDDEP29 4.343 -0.117 0.598 1.428 2.272
## EDDEP30 2.613 -0.023 0.868 1.864 2.826
## EDDEP31 3.183 -0.261 0.397 1.305 2.134
## EDDEP35 3.106 0.044 0.722 1.639 2.471
## EDDEP36 3.483 -0.536 0.348 1.347 2.355
## EDDEP39 3.131 0.918 1.481 2.164 2.856
## EDDEP41 4.454 0.558 1.074 1.779 2.530
## EDDEP42 2.364 0.210 0.987 1.906 2.934
## EDDEP44 2.549 0.194 1.012 2.013 3.126
## EDDEP45 2.834 0.141 0.907 1.846 2.875
## EDDEP46 2.381 -0.458 0.478 1.546 2.632
## EDDEP48 3.185 0.198 0.782 1.526 2.324
## EDDEP50 2.018 -0.050 0.926 2.000 2.966
## EDDEP54 2.685 -0.299 0.423 1.358 2.308
## CESD1 2.074 0.876 1.921 3.064 NA
## CESD2 1.262 1.387 2.670 3.721 NA
## CESD3 3.512 0.833 1.316 1.949 NA
## CESD4 1.118 0.649 1.379 2.081 NA
## CESD5 1.605 0.429 1.526 2.724 NA
## CESD6 3.635 0.493 1.176 1.729 NA
## CESD7 1.828 0.287 1.368 2.134 NA
## CESD8 1.342 -0.067 0.823 1.620 NA
## CESD9 3.003 0.748 1.374 1.855 NA
## CESD10 2.060 1.172 2.043 3.268 NA
## CESD11 1.077 -0.463 0.947 2.160 NA
## CESD12 2.229 0.169 0.945 1.737 NA
## CESD13 1.288 0.342 1.696 2.915 NA
## CESD14 2.176 0.491 1.291 1.864 NA
## CESD15 1.397 0.965 2.321 3.604 NA
## CESD16 2.133 0.272 0.922 1.808 NA
## CESD17 1.719 1.607 2.317 3.470 NA
## CESD18 2.812 0.261 1.248 1.984 NA
## CESD19 1.834 0.784 1.875 2.639 NA
## CESD20 1.491 -0.140 1.256 2.297 NA
##
## $means
## F1
## -0.06
##
## $cov
## F1
## F1 0.95
and also other commonly used functions:
mirt::itemfit(out_calib, empirical.plot = 1)
mirt::itemplot(out_calib, item = 1, type = "info")
mirt::itemfit(out_calib, "S_X2", na.rm = TRUE)
Scale information functions can be plotted with
plotInfo
. The two required arguments are an output object
from runCalibration()
and a PROsetta
object
from loadData()
. The additional arguments specify the
labels, colors, and line types for each scale and the combined scale.
The last values in arguments scale_label
,
color
, lty
represent the values for the
combined scale.
plotInfo(
out_calib, d,
scale_label = c("PROMIS Depression", "CES-D", "Combined"),
color = c("blue", "red", "black"),
lty = c(1, 2, 3))