The data set `HC`

from `mlogit`

contains data in `R`

format on the choice of heating and central cooling system for 250 single-family, newly built houses in California.

The alternatives are:

- Gas central heat with cooling
`gcc`

, - Electric central resistence heat with cooling
`ecc`

, - Electric room resistence heat with cooling
`erc`

, - Electric heat pump, which provides cooling also
`hpc`

, - Gas central heat without cooling
`gc`

, - Electric central resistence heat without cooling
`ec`

, - Electric room resistence heat without cooling
`er`

.

Heat pumps necessarily provide both heating and cooling such that heat pump without cooling is not an alternative.

The variables are:

`depvar`

gives the name of the chosen alternative,`ich.alt`

are the installation cost for the heating portion of the system,`icca`

is the installation cost for cooling`och.alt`

are the operating cost for the heating portion of the system`occa`

is the operating cost for cooling`income`

is the annual income of the household

Note that the full installation cost of alternative `gcc`

is `ich.gcc+icca`

, and similarly for the operating cost and for the other alternatives with cooling.

- Run a nested logit model on the data for two nests and one log-sum coefficient that applies to both nests. Note that the model is specified to have the cooling alternatives (
`gcc},`

ecc},`erc},`

hpc}) in one nest and the non-cooling alternatives (`gc},`

ec}, `er}) in another nest.

```
library("mlogit")
data("HC", package = "mlogit")
HC <- dfidx(HC, varying = c(2:8, 10:16), choice = "depvar")
cooling.modes <- idx(HC, 2) %in% c('gcc', 'ecc', 'erc', 'hpc')
room.modes <- idx(HC, 2) %in% c('erc', 'er')
# installation / operating costs for cooling are constants,
# only relevant for mixed systems
HC$icca[! cooling.modes] <- 0
HC$occa[! cooling.modes] <- 0
# create income variables for two sets cooling and rooms
HC$inc.cooling <- HC$inc.room <- 0
HC$inc.cooling[cooling.modes] <- HC$income[cooling.modes]
HC$inc.room[room.modes] <- HC$income[room.modes]
# create an intercet for cooling modes
HC$int.cooling <- as.numeric(cooling.modes)
# estimate the model with only one nest elasticity
nl <- mlogit(depvar ~ ich + och +icca + occa + inc.room + inc.cooling + int.cooling | 0, HC,
nests = list(cooling = c('gcc','ecc','erc','hpc'),
other = c('gc', 'ec', 'er')), un.nest.el = TRUE)
summary(nl)
```

```
##
## Call:
## mlogit(formula = depvar ~ ich + och + icca + occa + inc.room +
## inc.cooling + int.cooling | 0, data = HC, nests = list(cooling = c("gcc",
## "ecc", "erc", "hpc"), other = c("gc", "ec", "er")), un.nest.el = TRUE)
##
## Frequencies of alternatives:choice
## ec ecc er erc gc gcc hpc
## 0.004 0.016 0.032 0.004 0.096 0.744 0.104
##
## bfgs method
## 11 iterations, 0h:0m:0s
## g'(-H)^-1g = 7.26E-06
## successive function values within tolerance limits
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## ich -0.554878 0.144205 -3.8478 0.0001192 ***
## och -0.857886 0.255313 -3.3601 0.0007791 ***
## icca -0.225079 0.144423 -1.5585 0.1191212
## occa -1.089458 1.219821 -0.8931 0.3717882
## inc.room -0.378971 0.099631 -3.8038 0.0001425 ***
## inc.cooling 0.249575 0.059213 4.2149 2.499e-05 ***
## int.cooling -6.000415 5.562423 -1.0787 0.2807030
## iv 0.585922 0.179708 3.2604 0.0011125 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -178.12
```

- The estimated log-sum coefficient is \(0.59\). What does this estimate tell you about the degree of correlation in unobserved factors over alternatives within each nest?

The correlation is approximately \(1-0.59=0.41\). It's a moderate correlation.

- Test the hypothesis that the log-sum coefficient is 1.0 (the value that it takes for a standard logit model.) Can the hypothesis that the true model is standard logit be rejected?

We can use a t-test of the hypothesis that the log-sum coefficient equal to 1. The t-statistic is :

` (coef(nl)['iv'] - 1) / sqrt(vcov(nl)['iv', 'iv'])`

```
## iv
## -2.304171
```

The critical value of t for 95% confidence is 1.96. So we can reject the hypothesis at 95% confidence.

We can also use a likelihood ratio test because the multinomial logit is a special case of the nested model.

```
# First estimate the multinomial logit model
ml <- update(nl, nests = NULL)
lrtest(nl, ml)
```

```
## Likelihood ratio test
##
## Model 1: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling |
## 0
## Model 2: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling |
## 0
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 8 -178.12
## 2 7 -180.29 -1 4.3234 0.03759 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Note that the hypothesis is rejected at 95% confidence, but not at 99% confidence.

- Re-estimate the model with the room alternatives in one nest and the central alternatives in another nest. (Note that a heat pump is a central system.)

```
nl2 <- update(nl, nests = list(central = c('ec', 'ecc', 'gc', 'gcc', 'hpc'),
room = c('er', 'erc')))
summary(nl2)
```

```
##
## Call:
## mlogit(formula = depvar ~ ich + och + icca + occa + inc.room +
## inc.cooling + int.cooling | 0, data = HC, nests = list(central = c("ec",
## "ecc", "gc", "gcc", "hpc"), room = c("er", "erc")), un.nest.el = TRUE)
##
## Frequencies of alternatives:choice
## ec ecc er erc gc gcc hpc
## 0.004 0.016 0.032 0.004 0.096 0.744 0.104
##
## bfgs method
## 10 iterations, 0h:0m:0s
## g'(-H)^-1g = 5.87E-07
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## ich -1.13818 0.54216 -2.0993 0.03579 *
## och -1.82532 0.93228 -1.9579 0.05024 .
## icca -0.33746 0.26934 -1.2529 0.21024
## occa -2.06328 1.89726 -1.0875 0.27681
## inc.room -0.75722 0.34292 -2.2081 0.02723 *
## inc.cooling 0.41689 0.20742 2.0099 0.04444 *
## int.cooling -13.82487 7.94031 -1.7411 0.08167 .
## iv 1.36201 0.65393 2.0828 0.03727 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -180.02
```

- What does the estimate imply about the substitution patterns across alternatives? Do you think the estimate is plausible?

The log-sum coefficient is over 1. This implies that there is more substitution across nests than within nests. I don't think this is very reasonable, but people can differ on their concepts of what's reasonable.

- Is the log-sum coefficient significantly different from 1?

\begin{answer}[5] The t-statistic is :

` (coef(nl2)['iv'] - 1) / sqrt(vcov(nl2)['iv', 'iv'])`

```
## iv
## 0.5535849
```

`lrtest(nl2, ml)`

```
## Likelihood ratio test
##
## Model 1: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling |
## 0
## Model 2: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling |
## 0
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 8 -180.02
## 2 7 -180.29 -1 0.5268 0.468
```

We cannot reject the hypothesis at standard confidence levels.

- How does the value of the log-likelihood function compare for this model relative to the model in exercise 1, where the cooling alternatives are in one nest and the heating alternatives in the other nest.

`logLik(nl)`

`## 'log Lik.' -178.1247 (df=8)`

`logLik(nl2)`

`## 'log Lik.' -180.0231 (df=8)`

The \(\ln L\) is worse (more negative.) All in all, this seems like a less appropriate nesting structure.

- Rerun the model that has the cooling alternatives in one nest and the non-cooling alternatives in the other nest (like for exercise 1), with a separate log-sum coefficient for each nest.

`nl3 <- update(nl, un.nest.el = FALSE)`

- Which nest is estimated to have the higher correlation in unobserved factors? Can you think of a real-world reason for this nest to have a higher correlation?

The correlation in the cooling nest is around 1-0.60 = 0.4 and that for the non-cooling nest is around 1-0.45 = 0.55. So the correlation is higher in the non-cooling nest. Perhaps more variation in comfort when there is no cooling. This variation in comfort is the same for all the non-cooling alternatives.

- Are the two log-sum coefficients significantly different from each other? That is, can you reject the hypothesis that the model in exercise 1 is the true model?

We can use a likelihood ratio tests with models

`nl`

and`nl3`

.

`lrtest(nl, nl3)`

```
## Likelihood ratio test
##
## Model 1: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling |
## 0
## Model 2: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling |
## 0
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 8 -178.12
## 2 9 -178.04 1 0.1758 0.675
```

The restricted model is the one from exercise 1 that has one log-sum coefficient. The unrestricted model is the one we just estimated. The test statistics is 0.6299. The critical value of chi-squared with 1 degree of freedom is 3.8 at the 95% confidence level. We therefore cannot reject the hypothesis that the two nests have the same log-sum coefficient.

- Rewrite the code to allow three nests. For simplicity, estimate only one log-sum coefficient which is applied to all three nests. Estimate a model with alternatives
`gcc`

,`ecc`

and`erc`

in a nest,`hpc`

in a nest alone, and alternatives`gc`

,`ec`

and`er`

in a nest. Does this model seem better or worse than the model in exercise 1, which puts alternative`hpc`

in the same nest as alternatives`gcc`

,`ecc`

and`erc`

?

```
nl4 <- update(nl, nests=list(n1 = c('gcc', 'ecc', 'erc'), n2 = c('hpc'),
n3 = c('gc', 'ec', 'er')))
summary(nl4)
```

```
##
## Call:
## mlogit(formula = depvar ~ ich + och + icca + occa + inc.room +
## inc.cooling + int.cooling | 0, data = HC, nests = list(n1 = c("gcc",
## "ecc", "erc"), n2 = c("hpc"), n3 = c("gc", "ec", "er")),
## un.nest.el = TRUE)
##
## Frequencies of alternatives:choice
## ec ecc er erc gc gcc hpc
## 0.004 0.016 0.032 0.004 0.096 0.744 0.104
##
## bfgs method
## 8 iterations, 0h:0m:0s
## g'(-H)^-1g = 3.71E-08
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## ich -0.838394 0.100546 -8.3384 < 2.2e-16 ***
## och -1.331598 0.252069 -5.2827 1.273e-07 ***
## icca -0.256131 0.145564 -1.7596 0.07848 .
## occa -1.405656 1.207281 -1.1643 0.24430
## inc.room -0.571352 0.077950 -7.3297 2.307e-13 ***
## inc.cooling 0.311355 0.056357 5.5247 3.301e-08 ***
## int.cooling -10.413384 5.612445 -1.8554 0.06354 .
## iv 0.956544 0.180722 5.2929 1.204e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -180.26
```

The \(\ln L\) for this model is \(-180.26\), which is lower (more negative) than for the model with two nests, which got \(-178.12\).