---
title:  "STAT340: Discussion 10: Interactions and nonlinear terms"
author: "Names"
date:   "`r format(Sys.time(), '%d %B, %Y')`" # autogenerate date as date of last knit
documentclass: article
classoption: letterpaper
output:
  html_document:
    highlight: tango
    fig_caption: false
---

```{r setup, include=FALSE}
# if sourced, set working directory to file location
# added tryCatch in case knitting runs into error
tryCatch({
  if(Sys.getenv('RSTUDIO')=='1'){
    setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
  }}, error = function(e){}
)

# install necessary packages
if(!require(pacman)) install.packages("pacman")
pacman::p_load(knitr,tidyverse)

knitr::opts_chunk$set(tidy=FALSE,strip.white=FALSE,fig.align="center",comment=" #")
```

---

<style>
a:link {
  text-decoration: underline;
}
</style>

[Link to source file](ds10.Rmd)


## Just for fun (1 min)

<center><a href="https://xkcd.com/539/"><img id="comic" src="https://imgs.xkcd.com/comics/boyfriend.png" title="... okay, but because you said that, we're breaking up." style="width:555px;"></a></center>

<br/>


## Discuss together (10-15 min)

As an entire section, discuss these together:

1. How do you interpret an interaction term? (For example, if TV and radio advertising spending were found to have a significant interaction term with estimate 0.0011 when predicting sales, what does the 0.0011 physically mean?) Can you give 2 different interpretations?

2. Remember adding _**any**_ term to your `lm( )` formula will _**always**_ decrease the RSS, even if it's useless (like Karl said, you can test this by adding a randomly-generated column to any regression). How then do you test if adding an effect is significant?

3. How do you include a higher order term in a regression, and what does it represent? (For example, if TV advertising was found to have a significant quadratic term with estimate -0.002, what does it physically mean?)

4. What is the [hierarchy principle](http://pages.stat.wisc.edu/~karlrohe/ht/03-linear_regression.pdf#page=56), what does it mean, and why should we follow it?

<br/>


## Brief reading (optional, ~ 5 min)

Remember interaction can exist between any combination (2 or more) of categorical and quantitative variables. If you feel like you have a good understanding of how to interpret each of these cases, feel free to move on the next section; if you don't, briefly reading [this page](https://biologyforfun.wordpress.com/2014/04/08/interpreting-interaction-coefficient-in-r-part1-lm/) is highly recommended (pay close attention to the difference in wording for each interpretation and how that difference is reflected in the plots).

<br/>


## Exercise

For the exercise, I have _**modified mtcars**_ by reducing it to the first four columns (`mpg` is still the dependent variable; your predictor variables are now `cyl` `disp` and `hp`), and adding in some combination of significant interactions and/or quadratic terms.

Run the code below to import the new modified data frame, then fit a complete model with **ALL interaction and quadratic** terms (it is not uncommon to add all interaction terms, but you usually wouldn't add all quadratic terms like this unless you had a good reason; we're just doing it here for the sake of the exercise).

```{r}
mtcars2 = as_tibble(read.csv(     # first use base R read.csv, then convert to tibblex
  row.names = 1,                  # row.names=1 means treat first column as row names
  text = ",mpg,cyl,disp,hp        # text='....' means use this string of text as the data
  Mazda RX4,14.8,6,160,110
  Mazda RX4 Wag,14.8,6,160,110
  Datsun 710,14.5,4,108,93
  Hornet 4 Drive,21.4,6,258,110
  Hornet Sportabout,28.1,8,360,175
  Valiant,15.7,6,225,105
  Duster 360,23.7,8,360,245
  Merc 240D,17.6,4,147,62
  Merc 230,15.8,4,141,95
  Merc 280,13.4,6,168,123
  Merc 280C,12,6,168,123
  Merc 450SE,17.8,8,276,180
  Merc 450SL,18.7,8,276,180
  Merc 450SLC,16.6,8,276,180
  Cadillac Fleetwood,33.8,8,472,205
  Lincoln Continental,32.1,8,460,215
  Chrysler Imperial,33.7,8,440,230
  Fiat 128,23.3,4,78.7,66
  Honda Civic,21.3,4,75.7,52
  Toyota Corolla,24.7,4,71.1,65
  Toyota Corona,13.7,4,120,97
  Dodge Challenger,20.7,8,318,150
  AMC Javelin,19.1,8,304,150
  Camaro Z28,21.7,8,350,245
  Pontiac Firebird,33.2,8,400,175
  Fiat X1-9,18.2,4,79,66
  Porsche 914-2,18.2,4,120,91
  Lotus Europa,21.8,4,95.1,113
  Ford Pantera L,24.3,8,351,264
  Ferrari Dino,12.9,6,145,175
  Maserati Bora,18.6,8,301,335
  Volvo 142E,13.6,4,121,109"
))
```

<br/>


After importing the new mtcars data, repeat **all interpretation and prediction steps** of [discussion 9](https://karlrohe.github.io/340-Spring21/discussions/09/ds09.html). Specifically, answer the following questions:


1. What are the coefficient estimates and standard errors? Give an interpretation of **one of the new terms** added in this discussion.

   > _**REPLACE TEXT WITH RESPONSE**_

2. What are the R² and adjusted R² for this model? Give an interpretation of both.

   > _**REPLACE TEXT WITH RESPONSE**_

3. Give $95\%$ confidence intervals for all coefficients. Give an interpretation of **one of the new intervals**.

   > _**REPLACE TEXT WITH RESPONSE**_

4. According to the model, what mileage would I expect on average with a car that has 6 cylinders, 200 displacement, and 120 horsepower? Be careful in your calculation with the interactions and quadratic terms.

   > _**REPLACE TEXT WITH RESPONSE**_


<br/>


## Submission

As usual, make sure the names of everyone who worked on this with you is included in the header of this document. Then, knit this document and submit both this file and the HTML output on Canvas under Assignments ⇒ Discussion 10.