Course orientation

PSTAT197A/CMPSC190DD Fall 2022

Trevor Ruiz


Before we begin…

  1. Confer with your table and choose a word of the day. Agree on spelling.

  2. Please sign in using the attendance reporting form found here:

General orientation


PSTAT197A/CMPSC190DD is the first course in UCSB’s year-long data science capstone sequence.

  • Audience: undergraduate students of any discipline with a basic background in data science and an interest in research

  • Aim: prepare for an independent research or project experience

Capstone projects

Most students are preparing for capstone projects in winter and spring. Course foci were chosen with this in mind.

  • Projects are varied ➜ emphasize problem patterns over methodology
  • Projects are collaborative ➜ emphasize teamwork and discussion
  • Projects are specialized ➜ practice independent learning based on use cases

Read about past projects at

Continuing in capstones

Continuation in PSTAT197B-C/CMPSC190DE-DF during winter and spring:

  • students admitted to this course in spring have a seat;

  • students admitted from the waitlist are on the waitlist.


I hope to support all of you in:

  • using modern software with version control for collaboration;
  • recognizing problem patterns based on data semantics and research questions;
  • identifying and accessing resources for independent learning given a problem of interest;
  • communicating data analysis and/or research findings.

Classroom environment

We are in an interactive classroom for a reason: to interact!

Let’s acknowledge:

  • Preparations and areas of expertise vary widely among the class

  • It’s okay not to know things

  • If you have a question, probably someone else does too


All course content is hosted on our website

Course content and structure


The course is configured in modules defined by a dataset and questions (much like a project).

A module typically comprises:

  • One session on data introduction (lecture/discussion)

  • Two sessions on problem patterns and related methodology (lecture)

  • Two labs with related examples (section meeting)

  • One session on sharing data analysis results (discussion)

Module content

The module datasets are currently as follows:

  • Class intake survey data (exploratory/descriptive analysis)

  • Biomarkers of autism (predictive modeling and variable selection)

  • Web fraud (text processing and deep learning)

  • Soil temperatures (correlated data)

Group assignments

Each module you will be assigned a working group.

Your group’s objective is to produce an analysis of the dataset:

  • Reproduce analysis presented/discussed in class meeting

  • Extend the analysis by

    • applying an alternative method that addresses the same question(s)

    • or addressing a corollary question


At the end of the class in place of a fifth module you will create a vignette (short demonstration) on a topic of interest.

  • present a use case

  • explain methodology

  • demonstrate implementation with example code

Expectations and assessments

Students are expected to:

  • prepare for class meetings as directed;

  • attend and actively participate in class and section meetings;

  • contribute meaningfully to group activities and assignments.

Students are assessed on:

  • attendance, preparation, and participation;

  • quality of submitted work;

  • individual contributions to group assignments;

  • oral interview/presentation.

Looking ahead

Next time

We’ll discuss:

  • data science as a discipline;

  • the research landscape;

  • systems and design thinking for data science.


Complete all of the following before our next meeting.

  1. Review all content in the about section of the course webpage.
  2. Install course software and create a GitHub account.
  3. Fill out capstone project intake form.
  4. Read Peng, R. D., & Parker, H. S. (2022). Perspective on data science. Annual Review of Statistics and Its Application, 9, 1-20. (access online via UCSB library).
  5. Prepare a reading response.