email: first name and last name at stat dott wisc dott edu
office: 1239 MSC
Syllabus, discussions, text, R labs, and Project description, Project timeline.

Office Hours:

chart of weekly office hours. for visually impaired, see text version of this chart at the following link to the canvas announcement: https://canvas.wisc.edu/courses/244428/discussion_topics/922086

Week 13

  1. Clustering, PCA.
  2. Multiple testing slides
  3. Multiple testing simulation
  4. Workshop projects

Week 13

  1. Random forest Slides 1-7, 29-40.
  2. Clustering, PCA.
  3. Thesis statements
  4. Workshop projects
  5. Multiple testing slides
  6. Multiple testing simulation

Homework

ISLR Chapter 4, questions 4 and 6 (p168). Due 4/26. EDIT: ONLY QUESTION 6. DON’T DO QUESTION 4.

Week 12

  1. logistic regression.
  2. Review missingness suggestions
  3. Random forest Slides 1-7, 29-40.
  4. Clustering, PCA and factor rotations.
  5. Thesis statements
  6. Workshop projects

Week 11

  1. logistic regression.
  2. Workshop projects

Week 10

  1. linear regression in ISLR.
  2. Have we covered enough content by Tuesday to turn in HW on Wednesday?
  3. What is your data? Why is it interesting?

Homework: Chapter 3 in ISLR. Questions 1, 3, 4, and 15 (p120). Due March 31 by 11:59pm.

Week 9

  1. Start linear regression in ISLR.

Week 8

Week 7

Estimation in World, Data, Models.

Learning objective: Be able to construct confidence intervals, given (1) a way to fit a model and (2) a way to simulate from the model.

Introduce the project description.

Project timeline

Week 6

Finish Hypothesis testing in World, Data, Models.

Start estimation in World, Data, Models.

Week 5

Continue Hypothesis testing in World, Data, Models.

After the logic of statistical testing via Monte Carlo Simulation, you should know how to test a hypothesis with Monte Carlo. This involves three steps.

  1. Convert null hypothesis into a statistical model from which we can simulate (recall chapter 1).
  2. Develop a test statistic \(S\) and “surprising set” for \(S\) based upon our understanding of the setting.
  3. Compute \(P(S \in SurprisingSet)\) with Monte Carlo to get a p-value (recall chapter 2).

Homework for testing Due Friday March 5 .Rmd

Homework for testing Due Friday March 5 .html

Week 4

Recap

Monte Carlo in World, Data, Models. Learning objectives:

  1. probabilities \(P(X \in A)\) as frequencies
  2. expectations \(\mathbb{E}(X)\) as averages and
  3. “distributions” as histograms

Topics

  1. Reductio ad unlikely
  2. Hypothesis testing in World, Data, Models.

Homework

To be posted on Feb 18.

Week 3

Recap

Finish random variables in World, Data, Models.

Learning objectives:

  • Identify random variables for basic things that we want to model. Things to think about… Ideally, you can you model the actual mechanism. Alternatively, you are “emulating its shape”. Things to think about: is it continuous or discrete? Does it have a “heavy tail”? Usually, you have multiple random variables.
  • Start to build richer models with basic random variables.
  • Critique why a certain distribution is a poor model for some real world phenomenon. One of the most important assumptions is independence.

Topics

Monte Carlo in World, Data, Models

Homework

homework 2 due Feb 17 in canvas.

Week 2

  1. Why do we need statistical models? What is the point of modeling? World, Data, Models (wdm)
  2. Chapter 1 in wdm; random variables.

Homework

homework1 due February 10 in canvas. Note that the Rmd and html files for the homework can be found by just editing the web address from .Rmd to .html at the end.

Week 1

Topics

  1. This is a fun and important course because this is not an intro course. It serves as a gateway to the “advanced” courses. Moreover, we get to learn fundamental methodologies and play with data.
  2. Go over syllabus
  3. Let’s begin with an example from my lab’s research:

Murmuration

  1. What is data science?
  2. I want you to begin to see data science as its own culture. It has a set of cultural practices to see the world with data.
  • these practices create data and software to study data.
  • moreover, these practices are emergent from the rich set of dependencies in the web of software and data.
  1. Why do we need statistical models? What is the point of modeling? World, Data, Models

Lecture discussion questions

What type of thing are you interested in studying? Topics? Do you think you could find data that you would find interesting?

Homework

  • Form groups of three(ish). Within your group, you should have similar interests (i.e. project topics). Talk to your friends or look for group members in discussion section. If you cannot found a group, that is totally fine! We will have a free agent session in class.

Texts:

An Introduction to Statistical Learning with Applications in R
by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

For reference: R for Data Science by Garrett Grolemund and Hadley Wickham