---
title: "STAT340: Discussion 10: Interactions and nonlinear terms"
author: "Names"
date: "`r format(Sys.time(), '%d %B, %Y')`" # autogenerate date as date of last knit
documentclass: article
classoption: letterpaper
output:
html_document:
highlight: tango
fig_caption: false
---
```{r setup, include=FALSE}
# if sourced, set working directory to file location
# added tryCatch in case knitting runs into error
tryCatch({
if(Sys.getenv('RSTUDIO')=='1'){
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
}}, error = function(e){}
)
# install necessary packages
if(!require(pacman)) install.packages("pacman")
pacman::p_load(knitr,tidyverse)
knitr::opts_chunk$set(tidy=FALSE,strip.white=FALSE,fig.align="center",comment=" #")
```
---
[Link to source file](ds10.Rmd)
## Just for fun (1 min)
## Discuss together (10-15 min)
As an entire section, discuss these together:
1. How do you interpret an interaction term? (For example, if TV and radio advertising spending were found to have a significant interaction term with estimate 0.0011 when predicting sales, what does the 0.0011 physically mean?) Can you give 2 different interpretations?
2. Remember adding _**any**_ term to your `lm( )` formula will _**always**_ decrease the RSS, even if it's useless (like Karl said, you can test this by adding a randomly-generated column to any regression). How then do you test if adding an effect is significant?
3. How do you include a higher order term in a regression, and what does it represent? (For example, if TV advertising was found to have a significant quadratic term with estimate -0.002, what does it physically mean?)
4. What is the [hierarchy principle](http://pages.stat.wisc.edu/~karlrohe/ht/03-linear_regression.pdf#page=56), what does it mean, and why should we follow it?
## Brief reading (optional, ~ 5 min)
Remember interaction can exist between any combination (2 or more) of categorical and quantitative variables. If you feel like you have a good understanding of how to interpret each of these cases, feel free to move on the next section; if you don't, briefly reading [this page](https://biologyforfun.wordpress.com/2014/04/08/interpreting-interaction-coefficient-in-r-part1-lm/) is highly recommended (pay close attention to the difference in wording for each interpretation and how that difference is reflected in the plots).
## Exercise
For the exercise, I have _**modified mtcars**_ by reducing it to the first four columns (`mpg` is still the dependent variable; your predictor variables are now `cyl` `disp` and `hp`), and adding in some combination of significant interactions and/or quadratic terms.
Run the code below to import the new modified data frame, then fit a complete model with **ALL interaction and quadratic** terms (it is not uncommon to add all interaction terms, but you usually wouldn't add all quadratic terms like this unless you had a good reason; we're just doing it here for the sake of the exercise).
```{r}
mtcars2 = as_tibble(read.csv( # first use base R read.csv, then convert to tibblex
row.names = 1, # row.names=1 means treat first column as row names
text = ",mpg,cyl,disp,hp # text='....' means use this string of text as the data
Mazda RX4,14.8,6,160,110
Mazda RX4 Wag,14.8,6,160,110
Datsun 710,14.5,4,108,93
Hornet 4 Drive,21.4,6,258,110
Hornet Sportabout,28.1,8,360,175
Valiant,15.7,6,225,105
Duster 360,23.7,8,360,245
Merc 240D,17.6,4,147,62
Merc 230,15.8,4,141,95
Merc 280,13.4,6,168,123
Merc 280C,12,6,168,123
Merc 450SE,17.8,8,276,180
Merc 450SL,18.7,8,276,180
Merc 450SLC,16.6,8,276,180
Cadillac Fleetwood,33.8,8,472,205
Lincoln Continental,32.1,8,460,215
Chrysler Imperial,33.7,8,440,230
Fiat 128,23.3,4,78.7,66
Honda Civic,21.3,4,75.7,52
Toyota Corolla,24.7,4,71.1,65
Toyota Corona,13.7,4,120,97
Dodge Challenger,20.7,8,318,150
AMC Javelin,19.1,8,304,150
Camaro Z28,21.7,8,350,245
Pontiac Firebird,33.2,8,400,175
Fiat X1-9,18.2,4,79,66
Porsche 914-2,18.2,4,120,91
Lotus Europa,21.8,4,95.1,113
Ford Pantera L,24.3,8,351,264
Ferrari Dino,12.9,6,145,175
Maserati Bora,18.6,8,301,335
Volvo 142E,13.6,4,121,109"
))
```
After importing the new mtcars data, repeat **all interpretation and prediction steps** of [discussion 9](https://karlrohe.github.io/340-Spring21/discussions/09/ds09.html). Specifically, answer the following questions:
1. What are the coefficient estimates and standard errors? Give an interpretation of **one of the new terms** added in this discussion.
> _**REPLACE TEXT WITH RESPONSE**_
2. What are the R² and adjusted R² for this model? Give an interpretation of both.
> _**REPLACE TEXT WITH RESPONSE**_
3. Give $95\%$ confidence intervals for all coefficients. Give an interpretation of **one of the new intervals**.
> _**REPLACE TEXT WITH RESPONSE**_
4. According to the model, what mileage would I expect on average with a car that has 6 cylinders, 200 displacement, and 120 horsepower? Be careful in your calculation with the interactions and quadratic terms.
> _**REPLACE TEXT WITH RESPONSE**_
## Submission
As usual, make sure the names of everyone who worked on this with you is included in the header of this document. Then, knit this document and submit both this file and the HTML output on Canvas under Assignments ⇒ Discussion 10.