Course materials

Introductory module

Objectives: set expectations; explore data science raison d’etre; introduce systems and design thinking; introduce software tools and collaborative coding; conduct exploratory/descriptive analysis of class background and interests.

Week 0

  • Thursday meeting: Course orientation [slides]

  • Assignments due by next class meeting:

Week 1

Week 2

  • Tuesday meeting: Introducing class intake survey data [slides]

  • Section meeting: tidyverse basics [activity]

  • Thursday meeting: planning group work for analysis of survey data [slides]

  • Assignments:

Module 1: biomarker identification

Objectives: introduce variable selection, classification, and multiple testing problems; discuss classification accuracy metrics and data partitioning; fit logistic regression and random forest classifiers in R; learn to implement multiple testing corrections for FDR control (Benjamini-Hochberg and Benjamini-Yekutieli); discuss selection via penalized estimation. Data from Hewitson et al. (2021) .

Week 3

Week 4

  • Tuesday meeting: random forests cont’d; logistic regression [slides]

  • Section meeting: logistic regression and classification metrics [activity]

  • Thursday meeting: LASSO regularization [slides]

  • Assignments:

Module 2: fraud claims

Objectives: introduce NLP techniques for converting text to data and web scraping tools in R; discuss dimension reduction techniques; introduce multiclass classification; learn to process text, fit multinomial logistic regression models, and train neural networks in R.

Week 5

  • Tuesday meeting: data introduction and basic NLP techniques [slides]

  • Section meeting: string manipulation and text processing in R [activity]

  • Thursday meeting: dimension reduction; multinomial logistic regression [slides] [activity]

  • Optional further reading:

Week 6

  • Tuesday meeting: feedforward neural networks [slides]

  • Section meeting: fitting neural nets with keras [activity]

  • Thursday meeting: NO CLASS

  • Assignments:

  • Optional further reading:

    • Alzubaidi et al. (2021)

    • Goodfellow, Bengio, and Courville (2016) Ch. 6 (advanced)

References

Alzubaidi, Laith, Jinglan Zhang, Amjad J Humaidi, Ayad Al-Dujaili, Ye Duan, Omran Al-Shamma, José Santamarı́a, Mohammed A Fadhel, Muthana Al-Amidie, and Laith Farhan. 2021. “Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions.” Journal of Big Data 8 (1): 1–74.
Cambria, Erik, and Bebo White. 2014. “Jumping NLP Curves: A Review of Natural Language Processing Research.” IEEE Computational Intelligence Magazine 9 (2): 48–57.
Emmert-Streib, Frank, Zhen Yang, Han Feng, Shailesh Tripathi, and Matthias Dehmer. 2020. “An Introductory Review of Deep Learning for Prediction Models with Big Data.” Frontiers in Artificial Intelligence 3: 4.
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.
Hewitson, Laura, Jeremy A Mathews, Morgan Devlin, Claire Schutte, Jeon Lee, and Dwight C German. 2021. “Blood Biomarker Discovery for Autism Spectrum Disorder: A Proteomic Analysis.” PLoS One 16 (2): e0246581.
Khan, Aurangzeb, Baharum Baharudin, Lam Hong Lee, and Khairullah Khan. 2010. “A Review of Machine Learning Algorithms for Text-Documents Classification.” Journal of Advances in Information Technology 1 (1): 4–20.
Peng, Roger D, and Hilary S Parker. 2022. “Perspective on Data Science.” Annual Review of Statistics and Its Application 9: 1–20.