PSTAT197A/CMPSC190DD Fall 2022
UCSB
come to class next time prepared to show a draft of your vignette
after class today look for abstracts on the course site
office hours in place of section meetings this Wednesday
Amgen is an international biotechnology company.
Project: evaluation of natural language processing algorithms used for knowledge graph generation
NLP algorithms are used to recognize and link entities and identify relations based on text data
outputs can be used to extract networks (‘knowledge graphs’) from text corpora
Goals: generate a benchmarking dataset from PubMed database and evaluate performance of NLP-based methods for knowledge graph construction
See this example of related work.
Appfolio is a local property management software company. Among other things, clients use their software for accounting purposes.
Project: anomaly detection from property management transaction histories
Goals: determine applicable anomaly/outlier detection methods, evaluate performance, and develop dashboard based on best method(s)
CalCOFI stands for California Cooperative Oceanic Fisheries Investigations. They run a long-term monitoring program of the California current ecosystem.
Project: an eDNA window into larval fish habitat, ecosystem structure, and function
CalCOFI collects physical data and environmental DNA (eDNA) across depth in the water column at multiple monitoring sites longitudinally
a major interest is on impacts of environmental conditions on fisheries
Goals: develop dashboard for exploration of eDNA data and develop a model for prediction of fish larvae based on eDNA and physical data (or derived variables)
See last year’s project.
Carpe Data is a local company focusing on data-driven solutions for insurance carriers.
Project: business characteristics classification models
Goals: classify businesses according to characteristics and/or risk levels based on basic business information (name, description, hours, images, etc.) and develop software pipelines for preprocessing and prediction
The Caves lab studies visual acuity and its evolutionary and ecological drivers in animals. Bees are great model organisms for studying the relationship between ecology and acuity due to wide variation in lifestyles and ecologies.
Project: measuring visual acuity in bees from high-resolution images
acuity quantification is derived from physical measurements that could potentially be inferred from photographs rather than measured directly
specifically, radius of curvature of the eye and width of ommatidia (compound eye facets)
Goals: utilize computer vision techniques to infer acuity measurements from photographs and merge with ecological data to explore correlates
The Cheadle Center for Biodiversity and Ecological Restoration (CCBER) contributes to the Big Bee Project, aimed at creating >1M 2D and 3D high-resolution images of bees for the study of anatomical variation.
Project: constructing three-dimensional bee models from high-resolution images
several software tools for constructing 3D models from 2D images are available, but performance with bees specifically is not well understood
several image sets of ~100 photographs each are available for construction of models
Goals: after receiving training on 3D modeling tools, students will experiment with parameter tuning for optimal rendering of bee models; students will then derive physical measurements from the models.
The Climate Hazards Center housed in the geography department is a multidisciplinary research center focusing on climate risk analysis and response.
Project 1: identifying the drivers of food insecurity in the developing world
precipitation and potential evapotranspiration together can help identify deficits in available water and risk of food shortages
explore the relationship between precipitation and potential evapotranspiration globally over time and identify drivers after accounting for relationship
Project 2: evaluating and validating station- and satellite-based daily precipitation datasets
CHC supports precipitation datasets based on interpolating measurements from satellites and ground stations
new data releases go through an evaluation/validation process with benchmarking data prior to release
students will carry out evaluation/validation of a new release based on prior strategies
Patrick Green is a research scientist in EEMB studying animal behavior and competition, and is developing a pilot project studying contests among mantis shrimp.
Project: how do mantis shrimp fight in a community of competitors?
feasible to carry out continuous video monitoring of small populations in tanks to capture interactions and other behavior
time-consuming to review footage
not obvious how to define/code interactions
Goal: develop heuristic methodology for detecting time intervals in which interactions occur; generate tracking data summarizing movements of each individual.
Evidation is a California-based company focusing on health data analytics for individuals and for researchers.
Project: Impact of case definition on early detection systems for COVID-19
model-based early detection systems for infectious disease use inconsistent criteria for case onset
the definition of when a case starts may impact the efficacy and other features of early detection systems
Goals: assess the impact of case onset definition on existing early detection models and find optimal case onset points.
Inogen is a medical device company that builds portable oxygen concentrators.
Project: analysis of portable oxygen concentrator patient use data
Goals: identify patterns of device use from patient data with particular focus on adherence.
The MOVE lab at UCSB focuses on movement data science in general and in particular human mobility in response to disruptions.
Project: detecting changes in human mobility and movement patterns associated with wildfires in California
movement and mobility are often markers of behavior and changes in movement patterns may capture information about behavioral responses to events
natural disaster in general and wildfire in particular are likely to produce shifts in movement patterns
Goals: assess suitability of several candidate datasets for studying behavioral responses to wildfires; identify movement patterns and explore the hypothesis that change points occur in connection with wildfire events; potentially explore demographic covariates.
P3 is a local company focusing on applied sports science and technology for biomechanical analysis of athletic performance.
Project: understanding links between biomechanical data and on-court NBA production
P3 collects biomechanical data – force plate and motion capture – on professional athletes and maintains a large proprietary database on NBA athletes
historically, have focused on injury risk, but interested in finding biomechanical correlates of real performance
Goals: scrape publicly available on-court data, merge with biomechanical data, and identify correlates of on-court production; develop visualization tools.
Stanford Synchrotron Ratiation Lightsource (SSRL) is a DOE facility at Stanford supporting a wide range of fundamental research involving bright X-rays.
Project: diffraction image selector
X-ray diffraction data provides insight into the atomic and molecular structure of crystals; serial crystallography involves merging diffraction patterns from several crystals in order to analyze the material structure
experimenters tend to select diffraction images manually from serial experiments; this selection has an impact on experimental outputs
Goals: build a regression model to label images during data collection in real time; develop model from simulated data and validate on real data.
Please read abstracts first and then fill out the preference form by Friday 12/2.
3 top lab choices
3 top industry choices
We’ll try to accommodate preferences, but we can’t guarantee you’ll get your top choices.
Target date for assignments: Friday 12/9.