# Predicting Choices from Estimated Models

Once a model has been estimated, it can be used to predict choices for a set of alternatives. This vignette demonstrates examples of how to so using the predictChoices() function along with the results of an estimated model.

# The data

To predict choices, you first need to define a set of alternatives for which you want to make predictions. Each row should be an alternative, and each column should be an attribute. I will predict choices on the full yogurt data set, which was used to estimate each of the models used in this example.

This example uses the yogurt data set from Jain et al. (1994). The data set contains 2,412 choice observations from a series of yogurt purchases by a panel of 100 households in Springfield, Missouri, over a roughly two-year period. The data were collected by optical scanners and contain information about the price, brand, and a “feature” variable, which identifies whether a newspaper advertisement was shown to the customer. There are four brands of yogurt: Yoplait, Dannon, Weight Watchers, and Hiland, with market shares of 34%, 40%, 23% and 3%, respectively.

head(yogurt)
#> # A tibble: 6 × 15
#>      id obsID   alt choice price  feat brand   dannon hiland weight yoplait
#>   <dbl> <int> <int>  <dbl> <dbl> <dbl> <chr>    <dbl>  <dbl>  <dbl>   <dbl>
#> 1     1     1     1      0  8.1      0 dannon       1      0      0       0
#> 2     1     1     2      0  6.10     0 hiland       0      1      0       0
#> 3     1     1     3      1  7.90     0 weight       0      0      1       0
#> 4     1     1     4      0 10.8      0 yoplait      0      0      0       1
#> 5     1     2     1      1  9.80     0 dannon       1      0      0       0
#> 6     1     2     2      0  6.40     0 hiland       0      1      0       0
#> # … with 4 more variables: brand_dannon <int>, brand_hiland <int>,
#> #   brand_weight <int>, brand_yoplait <int>

# Predicting with multinomial logit models

## Preference space parameterization

In the example below, I estimate a preference space MNL model called mnl_pref. I can then use the predictChoices() function with the mnl_pref model to predict the choices for each set of alternatives in the yogurt data set:

# Estimate the model
mnl_pref <- logitr(
data   = yogurt,
choice = 'choice',
obsID  = 'obsID',
pars   = c('price', 'feat', 'brand')
)

# Predict choices
choices_mnl_pref <- predictChoices(
model = mnl_pref,
alts  = yogurt,
altID = "alt",
obsID = "obsID"
)
# Preview actual and predicted choices
#>     obsID choice choice_predict
#> 1.1     1      0              0
#> 1.2     1      0              0
#> 1.3     1      1              1
#> 1.4     1      0              0
#> 2.5     2      1              1
#> 2.6     2      0              0

The resulting choices_mnl_pref data frame contains the same alts data frame with an additional column, choice_predict, which contains the predicted choices. You can quickly compute the accuracy by dividing the number of correctly predicted choices by the total number of choices:

chosen <- subset(choices_mnl_pref, choice == 1)
chosen$correct <- chosen$choice == chosen$choice_predict sum(chosen$correct) / nrow(chosen)
#> [1] 0.3897181

## WTP space parameterization

You can also use WTP space models to predict choices. For example, here are the results from an equivalent model but in the WTP space:

# Estimate the model
mnl_wtp <- logitr(
data       = yogurt,
choice     = 'choice',
obsID      = 'obsID',
pars       = c('feat', 'brand'),
price      = 'price',
modelSpace = 'wtp',
numMultiStarts = 10
)

# Make predictions
choices_mnl_wtp <- predictChoices(
model = mnl_wtp,
alts  = yogurt,
altID = "alt",
obsID = "obsID"
)
#> NOTE: Using results from run 8 of 10 multistart runs
#> (the run with the largest log-likelihood value)
# Preview actual and predicted choices
#>     obsID choice choice_predict
#> 1.1     1      0              0
#> 1.2     1      0              0
#> 1.3     1      1              0
#> 1.4     1      0              1
#> 2.5     2      1              1
#> 2.6     2      0              0

# Predicting with mixed logit models

## Preference space parameterization

You can also use mixed logit models to predict choices. Heterogeneity is modeled by simulating draws from the population estimates of the estimated model. Here is an example using a preference space mixed logit model:

# Estimate the model
mxl_pref <- logitr(
data     = yogurt,
choice   = 'choice',
obsID    = 'obsID',
pars     = c('price', 'feat', 'brand'),
randPars = c(feat = 'n', brand = 'n'),
numMultiStarts = 5
)

# Make predictions
choices_mxl_pref <- predictChoices(
model = mxl_pref,
alts  = yogurt,
altID = "alt",
obsID = "obsID"
)
# Preview actual and predicted choices
#>     obsID choice choice_predict
#> 1.1     1      0              1
#> 1.2     1      0              0
#> 1.3     1      1              0
#> 1.4     1      0              0
#> 2.5     2      1              0
#> 2.6     2      0              0

## WTP space parameterization

Likewise, mixed logit WTP space models can also be used to predict choices:

# Estimate the model
mxl_wtp <- logitr(
data       = yogurt,
choice     = 'choice',
obsID      = 'obsID',
pars       = c('feat', 'brand'),
price      = 'price',
randPars   = c(feat = 'n', brand = 'n'),
modelSpace = 'wtp',
numMultiStarts = 5
)

# Make predictions
choices_mxl_wtp <- predictChoices(
model = mxl_wtp,
alts  = yogurt,
altID = "alt",
obsID = "obsID"
)
# Preview actual and predicted choices
#>     obsID choice choice_predict
#> 1.1     1      0              0
#> 1.2     1      0              0
#> 1.3     1      1              1
#> 1.4     1      0              0
#> 2.5     2      1              0
#> 2.6     2      0              0

# Compare prediction results

library(dplyr)

# Combine models into one data frame
choices <- rbind(
choices_mnl_pref, choices_mnl_wtp, choices_mxl_pref, choices_mxl_wtp)
choices\$model <- c(
rep("mnl_pref", nrow(choices_mnl_pref)),
rep("mnl_wtp",  nrow(choices_mnl_wtp)),
rep("mxl_pref", nrow(choices_mxl_pref)),
rep("mxl_wtp",  nrow(choices_mxl_wtp)))

# Compute prediction accuracy by model
choices %>%
filter(choice == 1) %>%
mutate(predict_correct = (choice_predict == choice)) %>%
group_by(model) %>%
summarise(p_correct = sum(predict_correct) / n())
#> # A tibble: 4 × 2
#>   model    p_correct
#>   <chr>        <dbl>
#> 1 mnl_pref     0.390
#> 2 mnl_wtp      0.362
#> 3 mxl_pref     0.390
#> 4 mxl_wtp      0.379

The models all perform about the same with ~38% correct predictions. This is significantly better than random predictions, which should be 25%.

# References

Jain, Dipak C, Naufel J Vilcassim, and Pradeep K Chintagunta. 1994. “A Random-Coefficients Logit Brand-Choice Model Applied to Panel Data.” Journal of Business & Economic Statistics 12 (3): 317–28.