Course materials
Introductory module
Objectives: set expectations; explore data science raison d’etre; introduce systems and design thinking; introduce software tools and collaborative coding; conduct exploratory/descriptive analysis of class background and interests.
Week 0
Thursday meeting: Course orientation [slides]
Assignments due by next class meeting:
install course software and create github account;
fill out intake form
read Peng and Parker (2022);
prepare a reading response
Week 1
Tuesday meeting: On projects in(volving) data science [slides]
Section meeting: software and technology overview [activity]
Assignments due by next class meeting:
read MDSR 9.1 – 9.2
prepare a reading response
Week 2
Tuesday meeting: Introducing class intake survey data [slides]
Section meeting: tidyverse basics [activity]
Thursday meeting: planning group work for analysis of survey data [slides]
Assignments:
- first team assignment due Sunday, October 19, 11:59 PM PST [accept via GH classroom here]
Module 1: biomarker identification
Objectives: introduce variable selection, classification, and multiple testing problems; discuss classification accuracy metrics and data partitioning; fit logistic regression and random forest classifiers in R; learn to implement multiple testing corrections for FDR control (Benjamini-Hochberg and Benjamini-Yekutieli); discuss selection via penalized estimation. Data from Hewitson et al. (2021) .
Week 3
Tuesday meeting: introducing biomarker data; multiple testing [slides]
Section meeting: iteration strategies [activity]
Thursday meeting: correlation analysis; random forests [slides] [activity]
Assignments due by next class meeting:
read MDSR 10.1 - 10.2
read Hewitson et al. (2021)
prepare a reading response
Week 4
Tuesday meeting: random forests cont’d; logistic regression [slides]
Section meeting: logistic regression and classification metrics [activity]
Thursday meeting: LASSO regularization [slides]
Assignments:
- second group assignment due Friday, October 31, 11:59pm PST [accept via GH classroom]
Module 2: fraud claims
Objectives: introduce NLP techniques for converting text to data and web scraping tools in R; discuss dimension reduction techniques; introduce multiclass classification; learn to process text, fit multinomial logistic regression models, and train neural networks in R.
Week 5
Week 6
Tuesday meeting: feedforward neural networks [slides]
Section meeting: fitting neural nets with keras [activity]
Thursday meeting: NO CLASS
Assignments:
Midquarter assessments [form]
Request winter add code [form]
Read Emmert-Streib et al. (2020) (§1-5, §9) and prepare a reading response
third group assignment due Monday, November 14, 11:59pm PST [accept via GH classroom] [group assignments]
Optional further reading: