Materials

Class forms

Resources

Textbooks:

Documentation:

Introductory module

Objectives: set expectations; explore data science raison d’etre; introduce systems and design thinking; introduce software tools and collaborative coding; conduct exploratory/descriptive analysis of class background and interests.

Week 0

  • Thursday meeting: Course orientation [slides]

  • Assignments due by next class meeting:

Week 1

Week 2

  • Tuesday meeting: Introducing class intake survey data [slides]

  • Section meeting: tidyverse basics [activity]

  • Thursday meeting: planning group work for analysis of survey data [slides]

  • Assignments:

Module 1: biomarker identification

Objectives: introduce variable selection, classification, and multiple testing problems; discuss classification accuracy metrics and data partitioning; fit logistic regression and random forest classifiers in R; learn to implement multiple testing corrections for FDR control (Benjamini-Hochberg and Benjamini-Yekutieli); discuss selection via penalized estimation. Data from Hewitson et al. (2021) .

Week 3

Week 4

  • Tuesday meeting: random forests cont’d; logistic regression [slides]
  • Section meeting: logistic regression and classification metrics [activity]
  • Thursday meeting: LASSO regularization [slides]
  • Assignments:

Module 2: fraud claims

Objectives: introduce NLP techniques for converting text to data and web scraping tools in R; discuss dimension reduction techniques; introduce multiclass classification; learn to process text, fit multinomial logistic regression models, and train neural networks in R.

Week 5

  • Tuesday meeting: data introduction and basic NLP techniques [slides]

  • Section meeting: string manipulation and text processing in R [activity]

  • Thursday meeting: dimension reduction; multinomial logistic regression [slides] [activity]

  • Optional further reading:

Week 6

  • Tuesday meeting: feedforward neural networks [slides]

  • Section meeting: fitting neural nets with keras [activity]

  • Thursday meeting: assignment review and planning [slides]

  • Assignments:

  • Optional further reading:

    • Alzubaidi et al. (2021)

    • Goodfellow, Bengio, and Courville (2016) Ch. 6 (advanced)

Module 3: soil temperatures

Objectives: build a forecasting model; introduce concepts of spatial and temporal correlation; discuss function approximation and curve fitting with regression techniques; fit elementary time series models and regression with AR errors; spatial interpolation.

Week 7

  • Tuesday meeting: data introduction; function approximation using basis expansions [slides]

  • Section meeting: curve fitting [activity]

  • Thursday meeting: temporal correlation; a forecasting model [slides]

  • Optional further reading (available through UCSB library)

    • Sections 1.1, 1.2, and 2.3 in Shumway and Stoffer (2017)
    • Perperoglou et al. (2019)

Week 8

  • Tuesday meeting: spatial prediction [slides]

  • Section meeting: forecasting [activity]

  • Thursday meeting: NO CLASS

  • Optional further reading:

    • 8.1 – 8.3 in Bivand et al. (2008)

    • Ch. 12 in Dorman (2022, link)

Module 4: vignettes

Objectives: learn independently about a method of choice and prepare a teaching vignette illustrating its use; create shared reference material potentially useful for project work.

Week 9

  • Tuesday meeting: discussion on results of claims module; vignette workshopping [slides]

  • Section meeting: NO SECTION MEETING (Thanksgiving)

  • Thursday meeting: NO CLASS (Thanksgiving)

  • Assignments: vignettes [guidelines]

    • drafts due in class Thursday, 12/1 2pm PST

    • final version due Thursday, 12/8 11:59pm PST

Week 10

  • Tuesday meeting: capstone project overviews [slides]

  • Section meeting: office hours for vignette help

  • Thursday meeting: vignette presentation/exchange/feedback [feedback form]

  • Assignments due by Friday, 12/2:

References

Alzubaidi, Laith, Jinglan Zhang, Amjad J Humaidi, Ayad Al-Dujaili, Ye Duan, Omran Al-Shamma, José Santamarı́a, Mohammed A Fadhel, Muthana Al-Amidie, and Laith Farhan. 2021. “Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions.” Journal of Big Data 8 (1): 1–74.
Bivand, Roger S, Edzer J Pebesma, Virgilio Gómez-Rubio, and Edzer Jan Pebesma. 2008. Applied Spatial Data Analysis with R. Springer.
Cambria, Erik, and Bebo White. 2014. “Jumping NLP Curves: A Review of Natural Language Processing Research.” IEEE Computational Intelligence Magazine 9 (2): 48–57.
Dorman, Michael. 2022. “Introduction to Spatial Data Programming with R.” https://geobgu.xyz/r/.
Emmert-Streib, Frank, Zhen Yang, Han Feng, Shailesh Tripathi, and Matthias Dehmer. 2020. “An Introductory Review of Deep Learning for Prediction Models with Big Data.” Frontiers in Artificial Intelligence 3: 4.
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.
Hewitson, Laura, Jeremy A Mathews, Morgan Devlin, Claire Schutte, Jeon Lee, and Dwight C German. 2021. “Blood Biomarker Discovery for Autism Spectrum Disorder: A Proteomic Analysis.” PLoS One 16 (2): e0246581.
Khan, Aurangzeb, Baharum Baharudin, Lam Hong Lee, and Khairullah Khan. 2010. “A Review of Machine Learning Algorithms for Text-Documents Classification.” Journal of Advances in Information Technology 1 (1): 4–20.
Peng, Roger D, and Hilary S Parker. 2022. “Perspective on Data Science.” Annual Review of Statistics and Its Application 9: 1–20.
Perperoglou, Aris, Willi Sauerbrei, Michal Abrahamowicz, and Matthias Schmid. 2019. “A Review of Spline Function Procedures in r.” BMC Medical Research Methodology 19 (1): 1–16.
Shumway, Robert H, and David S Stoffer. 2017. Time Series Analysis and Its Applications: With r Examples. Springer.