Course syllabus

Data science capstone preparation

Course listing

PSTAT197A/CMPSC190DD

Updated

September 2022

Concurrent course listing: PSTAT197A and CMPSC190DD are held concurrently; enrollment is by instructor consent and admitted students may enroll under either listing. The course content, expectations, assessments, and course policies are identical for students enrolled in either course.

Catalog description: Introduction to research skills. Discussion of current research trends, writing literature reviews, etc. Students will be required to present materials reflecting their interests, which will be critically appraised for both content and presentation. Emphasis will be placed on aiding students to acquire a high-level of professionalism. Prerequisite: PSTAT126.

Meetings

Class meetings are held 2pm – 3:15pm Tuesdays and Thursdays in Ellison 2617.

Section meetings are held on Wednesdays:

  • 2pm – 2:50pm in Girvetz 2115 with Josh;

  • 3pm – 3:50pm in Girvetz 2116 with Erika;

  • 4pm – 4:50pm in North Hall 1109 with Megan.

Staff

Instructor:

  • Trevor Ruiz. Visiting assistant professor and co-instructor for 2021-2022 capstone projects.

Teaching assistants:

  • Erika McPhillips. MS/PhD student and capstone project mentor in 2021-2022.

  • Joshua Bang. MS/PhD student and capstone project mentor in 2021-2022.

  • Meghan Elcheikhali. PhD student and capstone project mentor in 2021-2022.

Undergraduate learning assistant:

  • Yan Lashchev. BS student and Data Science Fellow at UCSB in 2021-2022.

Tentative schedule

This schedule is tentative and may be adjusted at the discretion of the instructor. Check back for updates.

Week Theme Tuesday meeting Thursday meeting Section meeting
0 Module 0: Introductions NO CLASS Course orientation NO LAB
1 Module 0: Introductions

0.1 Lecture:

  • on research projects in(volving) data science

0.2 Activity:

  • collaboration using GitHub
Software and technology overview
2 Module 0: Introductions

0.3 Lecture/discussion:

  • introducing class survey data

0.4 Activity:

  • exploratory and descriptive analysis
tidyverse
3 Module 1: biomarkers

1.1 Discussion/lecture:

  • sharing results of survey data analysis;

  • introducing biomarker data

1.2 Lecture:

  • on prediction
tidymodels
4 Module 1: biomarkers

1.3 Lecture:

  • on classification

1.4 Lecture/discussion:

  • on variable selection;

  • review published analysis of biomarker data

classification
5 Module 2: web fraud

2.1 Lecture/discussion:

  • sharing analysis of soil temperature data;

  • introducing web fraud data

2.2 Lecture:

  • on text as data
text processing
6 Module 2: web fraud

2.3 Lecture:

  • on multiclass classification

2.4 Activity:

  • measuring classification accuracy
keras
7 Module 3: soil temperature

3.1 Discussion/lecture:

  • sharing results of biomarker analysis;

  • introducing soil temperature data

3.2 Lecture:

  • on time
time series analysis
8 Module 3: soil temperature

3.3 Lecture:

  • on space
3.4 Discussion: results spatial analysis
9 Module 4: vignettes

4.1 Activity:

  • workshopping vignettes
NO CLASS NO LAB
10 Module 4: vignettes

4.2 Activity:

  • teaching exchange

4.3 Activity/discussion:

  • teaching exchange;

  • closing

NO LAB

Learning outcomes

This course emphasizes collaborative, interactive, and hands-on learning. Instruction in PSTAT197A will support all students in:

  • using modern technology and version control to collaborate efficiently on programming for data science projects;

  • recognizing and articulating problem patterns based on data semantics and one or more research questions;

  • identifying and accessing resources to aid in learning independently about methodology and/or application domains pertinent to a problem of interest;

  • communicating data analysis and/or research findings in a project team setting and to a small audience of peers.

Course staff are committed to creating an inclusive learning environment. Data science involves a combination of computing, statistics and probability, and domain expertise, as well as use of technology and narrative communication and storytelling, and no one person should expect to be an expert in all of these areas. Course staff recognize this fact that core competencies vary considerably, acknowledge that each student has particular strengths and weaknesses and interests, and make their best effort to avoid promoting one skill set over others in the practice of data science.

Expectations and assessments

Much of the course is designed around group activity and discussion. Students are therefore expected to:

  • prepare for class meetings in advance by completing any assigned reading or activity;

  • attend and actively participate in class meetings and section meetings;

  • provide meaningful, timely, and concrete contributions to group activities.

Students having any difficulty in meeting these expectations should raise the issue(s) promptly with the instructor.

Qualitative feedback is emphasized over numerical scores. Students are assessed on:

  • attendance, preparation, and participation;

  • quality of submitted work;

  • individual contributions to group assignments;

  • oral interview.

Software

Computing in PSTAT197A will be shown in R, and codes and other materials will be shared via GitHub. The following software will be required to access course materials:

Installations and basic functionality will be covered in the first section meeting.

While PSTAT197A is not language-agnostic and some instruction in R is provided, it is also not a course especially emphasizing programming technique in R. Students are free to use or experiment with other software at their discretion provided it does not interfere with their participation in the class, but are expected to submit work and collaborate using RStudio-supported files.

GitHub

Students will learn and practice basic functionality of Git and GitHub for version control and collaboration by accessing course materials via GitHub repositories and submitting work via repository contributions.

We have a GitHub classroom for the data science capstone. Materials will be deployed via direct links. Students will be asked to submit work by contributing to team repositories; any such contributions will remain visible to course staff and team contributors, and so are not strictly private.

To access GitHub Classroom materials students will need to create a GitHub account if they do not already have one. Here is some advice on choosing a username.

Policies

Attendance. Regular attendance is expected. Each student can miss two sessions without notice; further absences may impact course grades. Students are responsible for material discussed in their absence and should review posted session notes and consult a classmate.

Deadlines. Students are expected to meet assignment deadlines in a timely manner. All deadlines have a 24-hour grace period. Late or amended work may not be accepted.

Email. Course staff will make their best effort to reply to email within 48 weekday hours. However, due to high volume, staff cannot guarantee that all messages will receive replies.

Illness. Students who are ill are required to stay home. Students ill with COVID-19 must comply with university policy regarding reporting and quarantine. Accommodations will be made to ensure that students absent due to illness do not fall behind.

Accommodations. Reasonable accommodations will be made for any student with a qualifying disability. Such requests should be made through the Disabled Students Program (DSP). More information, instructions on how to access accommodations, and information on related resources can be found on the DSP website. Note: in this class there are no timed assessments.

Letter grades. Letter grades are assigned based only on the assessments identified above and according to university guidelines, with the relative weighting of assessments determined at the discretion of the instructor. While grade calculations will not be disclosed, students are entitled to an explanation of the criteria used to determine their grades if desired. Grades will not be changed except in the case of clerical errors. If students feel their grade has been unfairly assigned, they are entitled to contest it following UCSB procedure for contesting grades.

Conduct. All course participants are expected to maintain respectful and honorable conduct consistent with UCSB ethical standards. Students uncomfortable with the behavior of another course participant for any reason should notify the instructor, course staff, or, if the complaint relates to course staff conduct, an administrative or departmental officer. Evidence of academic dishonesty will be reported to the Office of Student Conduct (OSC); evidence of problematic behavior will be addressed on a case-by-case basis in accord with university policies.