--- title: "STAT340: Discussion 10: Interactions and nonlinear terms" author: "Names" date: "`r format(Sys.time(), '%d %B, %Y')`" # autogenerate date as date of last knit documentclass: article classoption: letterpaper output: html_document: highlight: tango fig_caption: false --- ```{r setup, include=FALSE} # if sourced, set working directory to file location # added tryCatch in case knitting runs into error tryCatch({ if(Sys.getenv('RSTUDIO')=='1'){ setwd(dirname(rstudioapi::getActiveDocumentContext()$path)) }}, error = function(e){} ) # install necessary packages if(!require(pacman)) install.packages("pacman") pacman::p_load(knitr,tidyverse) knitr::opts_chunk$set(tidy=FALSE,strip.white=FALSE,fig.align="center",comment=" #") ``` --- [Link to source file](ds10.Rmd) ## Just for fun (1 min)

## Discuss together (10-15 min) As an entire section, discuss these together: 1. How do you interpret an interaction term? (For example, if TV and radio advertising spending were found to have a significant interaction term with estimate 0.0011 when predicting sales, what does the 0.0011 physically mean?) Can you give 2 different interpretations? 2. Remember adding _**any**_ term to your `lm( )` formula will _**always**_ decrease the RSS, even if it's useless (like Karl said, you can test this by adding a randomly-generated column to any regression). How then do you test if adding an effect is significant? 3. How do you include a higher order term in a regression, and what does it represent? (For example, if TV advertising was found to have a significant quadratic term with estimate -0.002, what does it physically mean?) 4. What is the [hierarchy principle](http://pages.stat.wisc.edu/~karlrohe/ht/03-linear_regression.pdf#page=56), what does it mean, and why should we follow it?
## Brief reading (optional, ~ 5 min) Remember interaction can exist between any combination (2 or more) of categorical and quantitative variables. If you feel like you have a good understanding of how to interpret each of these cases, feel free to move on the next section; if you don't, briefly reading [this page](https://biologyforfun.wordpress.com/2014/04/08/interpreting-interaction-coefficient-in-r-part1-lm/) is highly recommended (pay close attention to the difference in wording for each interpretation and how that difference is reflected in the plots).
## Exercise For the exercise, I have _**modified mtcars**_ by reducing it to the first four columns (`mpg` is still the dependent variable; your predictor variables are now `cyl` `disp` and `hp`), and adding in some combination of significant interactions and/or quadratic terms. Run the code below to import the new modified data frame, then fit a complete model with **ALL interaction and quadratic** terms (it is not uncommon to add all interaction terms, but you usually wouldn't add all quadratic terms like this unless you had a good reason; we're just doing it here for the sake of the exercise). ```{r} mtcars2 = as_tibble(read.csv( # first use base R read.csv, then convert to tibblex row.names = 1, # row.names=1 means treat first column as row names text = ",mpg,cyl,disp,hp # text='....' means use this string of text as the data Mazda RX4,14.8,6,160,110 Mazda RX4 Wag,14.8,6,160,110 Datsun 710,14.5,4,108,93 Hornet 4 Drive,21.4,6,258,110 Hornet Sportabout,28.1,8,360,175 Valiant,15.7,6,225,105 Duster 360,23.7,8,360,245 Merc 240D,17.6,4,147,62 Merc 230,15.8,4,141,95 Merc 280,13.4,6,168,123 Merc 280C,12,6,168,123 Merc 450SE,17.8,8,276,180 Merc 450SL,18.7,8,276,180 Merc 450SLC,16.6,8,276,180 Cadillac Fleetwood,33.8,8,472,205 Lincoln Continental,32.1,8,460,215 Chrysler Imperial,33.7,8,440,230 Fiat 128,23.3,4,78.7,66 Honda Civic,21.3,4,75.7,52 Toyota Corolla,24.7,4,71.1,65 Toyota Corona,13.7,4,120,97 Dodge Challenger,20.7,8,318,150 AMC Javelin,19.1,8,304,150 Camaro Z28,21.7,8,350,245 Pontiac Firebird,33.2,8,400,175 Fiat X1-9,18.2,4,79,66 Porsche 914-2,18.2,4,120,91 Lotus Europa,21.8,4,95.1,113 Ford Pantera L,24.3,8,351,264 Ferrari Dino,12.9,6,145,175 Maserati Bora,18.6,8,301,335 Volvo 142E,13.6,4,121,109" )) ```
After importing the new mtcars data, repeat **all interpretation and prediction steps** of [discussion 9](https://karlrohe.github.io/340-Spring21/discussions/09/ds09.html). Specifically, answer the following questions: 1. What are the coefficient estimates and standard errors? Give an interpretation of **one of the new terms** added in this discussion. > _**REPLACE TEXT WITH RESPONSE**_ 2. What are the R² and adjusted R² for this model? Give an interpretation of both. > _**REPLACE TEXT WITH RESPONSE**_ 3. Give $95\%$ confidence intervals for all coefficients. Give an interpretation of **one of the new intervals**. > _**REPLACE TEXT WITH RESPONSE**_ 4. According to the model, what mileage would I expect on average with a car that has 6 cylinders, 200 displacement, and 120 horsepower? Be careful in your calculation with the interactions and quadratic terms. > _**REPLACE TEXT WITH RESPONSE**_
## Submission As usual, make sure the names of everyone who worked on this with you is included in the header of this document. Then, knit this document and submit both this file and the HTML output on Canvas under Assignments ⇒ Discussion 10.