Outcome-dependent sampling (ODS) schemes are cost-effective ways to enhance study efficiency. In ODS designs, one observes the exposure/covariates with a probability that depends on the outcome variable. Popular ODS designs include case-control for binary outcome, case-cohort for time-to-event outcome, and continuous outcome ODS design (Zhou et al. 2002). Because ODS data has biased sampling nature, standard statistical analysis such as linear regression will lead to biases estimates of the population parameters. This package implements four statistical methods related to ODS designs: (1) An empirical likelihood method analyzing the primary continuous outcome with respect to exposure variables in continuous ODS design (Zhou et al., 2002). (2) A partial linear model analyzing the primary outcome in continuous ODS design (Zhou, Qin and Longnecker, 2011). (3) Analyze a secondary outcome in continuous ODS design (Pan et al. 2018). (4) An estimated likelihood method analyzing a secondary outcome in case-cohort data (Pan et al. 2017).

The references are the following:

Zhou H, Weaver M, Qin J, Longnecker M, Wang M. (2002). A
semiparametric empirical likelihood method for data from an
outcome‐dependent sampling scheme with a continuous outcome.
*Biometrics*, 58(2):413-421.

Zhou H, Qin G, Longnecker M. (2011). A partial linear model in the
outcome‐dependent sampling setting to evaluate the effect of prenatal
PCB exposure on cognitive function in children. *Biometrics*,
67(3):876-885.

Pan Y, Cai J, Kim S, Zhou H. (2017). Regression analysis for
secondary response variable in a case‐cohort study.
*Biometrics*.

Pan Y, Cai J, Longnecker M, Zhou H. (2018). Secondary outcome
analysis for data from an outcome‐dependent sampling design.
*Statistics in medicine*, 37(15):2321-2337.

We assume that in the population, the primary outcome variable \(Y\) follows the linear model: \[ Y = \beta_{0} + \beta_{1}X + \epsilon \] where \(X\) are the covariates, and \(\epsilon\sim N(0, \sigma^2)\). In continuous ODS design, a simple random sample is taken from the full cohort, then two supplemental samples are taken from tails of the \(Y\) distribution, i.e. \((-\infty, \mu_{Y} - a*\sigma_{Y})\) and \((\mu_{Y} + a*\sigma_{Y}, +\infty)\). As ODS data is not a simple random sample of the overall population, naive regression analysis will yield to invalid estimates of the population parameters. Zhou et al. (2002) develops a semiparametric empirical likelihood estimator (MSELE) for conducting inference on the parameters in the linear model.

Function **odsmle** provides the parameter estimates,
and function **se.spmle** calculates the standard error for
MSELE estimator.

We assume that in the population, the primary outcome variable \(Y\) follows the partial linear model: \[ E(Y|X,Z)=g(X)+Z^{T}\gamma \] where \(X\) is the expensive exposure, \(Z\) are other covariates. \(g(\cdot)\) is an unknown smooth function. Zhou, Qin and Longnecker (2011) considers a penalized spline method to estimate the nonparamatric function \(g(\cdot)\) and other regression coefficients \(\gamma\) under the ODS sampling scheme.

Function **Estimate_PLMODS** computes the parameter
estimates and standard error in the partial linear model. Function
**gcv_ODS** calculates the generalized cross-validation
(GCV) for selecting the smoothing parameter. The details can be seen in
Zhou, Qin and Longnecker (2011).

We assume that in the population, the primary outcome \(Y_1\) and the secondary outcome \(Y_2\) satisfy the following conditional
mean model: \[
E(Y_1|X,Z)=\beta_0+\beta_1X+\beta_2Z
\] \[
E(Y_2|X,Z)=\gamma_0+\gamma_1X+\gamma_2Z
\] Pan et al. (2018) proposed an augmented inverse probability
weighted estimating equation to analyze the secondary outcome
(parameters: \(\gamma_0, \gamma_1,
\gamma_2\)) for data obtained from the continuous ODS design.
Function **secondary_ODS** computes the parameter estimates
and standard error for \((\beta,
\gamma)\).

When the primary outcome is survival time, case-cohort design is
commonly used to enhance study efficiency. We assume that the primary
outcome (survival time) follows the Cox model: \[
\lambda(t|X,Y_2,Z)=\lambda_0(t)\exp(\gamma_1X+\gamma_2Y_2+\gamma_3Z)
\] \(Y_2\) is a secondary
outcome that satisfy the following linear model: \[
Y_2 = \beta_{0} + \beta_{1}X + \beta_2Z + \epsilon
\] where \(\epsilon\sim N(0,
\sigma^2)\). Pan et al. (2017) proposed a nonparametric estimated
likelihood approach for analyzing the secondary outcome \(Y_2\) when the data is obtained from a
case-cohort study. Function **secondary_casecohort**
computes the parameter estimates and standard error for \((\beta, \gamma)\).

```
install.packages("devtools")
devtools::install_github("Yinghao-Pan/ODS")
```