Curve fitting using basis approximations

PSTAT197A/CMPSC190DD Fall 2022

Trevor Ruiz

UCSB

Word clouds

From self-assessments: identify a helpful feature that improved groupwork.

Word clouds

From self-assessments: identify an area of improvement

Soil temperature data

ABoVE

The Arctic-Boreal Vulnerability Experiment (ABoVE) is a NASA Terrestrial Ecology Program field campaign in Alaska and western Canada from 2016 to 2021.

Research for ABoVE will link field-based, process-level studies with geospatial data products derived from airborne and satellite sensors, providing a foundation for improving the analysis, and modeling capabilities needed to understand and predict ecosystem responses and societal implications.

ABoVE Soil Temperatures

We’ll work with soil temperatures.

Nicolsky, D.J., V.E. Romanovsky, A.L. Kholodov, K. Dolgikh, and N. Hasson. 2022. ABoVE: Soil Temperature Profiles, USArray Seismic Stations, 2016-2021. ORNL DAAC, Oak Ridge, Tennessee, USA. https://doi.org/10.3334/ORNLDAAC/1680

  • Observations of soil temperatures (centigrade)

  • Measured at 63 locations in Alaska

  • Recorded four times daily at multiple depths

Site locations

We’ll use 57 of the 63 sites for now.

Locations of selected monitoring stations.

Example rows

# A tibble: 6 × 7
  site   latitude longitude elevation date_time           depth  temp
  <chr>     <dbl>     <dbl>     <dbl> <dttm>              <dbl> <dbl>
1 B21K-1     69.6     -155.        96 2017-08-14 12:00:00   0    3.46
2 B21K-1     69.6     -155.        96 2017-08-14 12:00:00   0.2  3.48
3 B21K-1     69.6     -155.        96 2017-08-14 12:00:00   1    3.38
4 B21K-1     69.6     -155.        96 2017-08-14 12:00:00   1.5  3.31
5 B21K-1     69.6     -155.        96 2017-08-14 18:00:00   0    3.92
6 B21K-1     69.6     -155.        96 2017-08-14 18:00:00   0.2  4.16

Temperature profiles

Observations for a single site at four depths (one path per depth).

  • What is happening over time?

  • What is happening across depth?

  • Any other observations?

Comparing sites

Profiles at two sites.

  • How do the sites differ?

  • How are they similar?

Comparing sites

Here are the locations of the sites just compared.

  • what factors might account for some of the differences in temperature profiles between the sites?

  • are any of them recorded in our data?

Goals

Our overall goal this week is to build a forecasting model.

Strategy:

  1. To start, approximate the seasonal trend.
  2. De-trend the data and model the correlation structure of deviations from seasonal trend.
  3. Forecast as: \(\text{trend} + \mathbb{E}(\text{future}| \text{present})\)

Function approximation

Annual cycles

The seasonality is annual – let’s examine the annual cycle instead of the usual time course plot.

Annual cycle for site H17K-1 at 0.2m depth.
  • Can you see the start and stop dates in the plot?
  • Any other observations?

Pooling sites

How would you estimate the annual cycle based on data at each site?

Daily average temperatures at 0.2m depth for 37 sites, 2017-2019.

As an estimation problem

Modeling the trend can be formulated as estimating the model:

\[ Y_{i, t} = f(t) + \epsilon_{i, t} \]

Where:

  • \(Y_{i, t}\) is the temperature at site \(i\) and time \(t\)

  • \(f(t)\) is the mean temperature at time \(t\)

  • \(\epsilon_{i, t}\) is a random error

But how do you estimate an arbitrary function?

Basis functions

A basis function is an element of a basis for a function space.

If \(\{f_j\}\) form a basis for a function space \(C\) then

\[ f \in C \quad\Longleftrightarrow f = \sum_j c_j f_j \]

A finite subset of basis functions can be used to approximate functions in the space:

\[ f \approx \sum_{j = 1}^J c_j f_j \]

Basis approximation

A nifty trick is to estimate \(f\) using a suitable basis approximation:

\[ Y_{i, t} = \beta_0 + \color{maroon}{\underbrace{\sum_{j = 1}^J \beta_j f_j(t)}_{\tilde{f}(t) \approx f(t)}} + \epsilon_{i, t} \]

This model can be fit using standard linear regression. (Think of the \(f_j(t)\)’s as \(J\) ‘new’ predictors.)

Spline basis

The spline basis is a basis for piecewise polynomials of a specified order.

Bases for piecewise polynomials of order 1 through 4 joined at evenly-spaced knot points.

Generated recursively based on ‘knots’ – joining locations

Knot spacing

Knot spacing will affect how densely basis functions are concentrated around particular regions of data.

Here are bases generated on some unevenly-spaced knots:

Check your understanding: where would this spline basis have the most flexible approximation capability?

Knot placement

Appropriate placement of knots is essential for quality function approximation.

  • default: place at data quantiles

  • better: concentrated in regions with irregular trend

Where would you put them for our data?

A first attempt: spline basis

Model: \(Y_{i,t} = \beta_0 + \beta_1\cdot\text{elev}_i + \sum_{j = 1}^7 \gamma_j \cdot f_j(t) + \epsilon_{i, t}\)

Knots placed at vertical lines.

Estimated mean with 95% prediction interval at median site elevation.

A problem

Spline bases produce discontinuities

The choice of basis must match problem context.

  • here, need boundaries to meet

  • in other words, need a harmonic function

Fourier basis

The Fourier basis is a basis for square-integrable functions on closed intervals consisting of sine-cosine pairs.

4 Fourier basis functions on the interval [1, 365].

Second try

Seasonal mean approximation using 4 Fourier basis functions.

Seasonal mean approxiamtion using 4 Fourier basis functions.

Forecasting

Does this forecast make sense? Why or why not?

Next time

  1. Fit a time series model to the residuals

    \[ e_{i, t} = Y_{i, t} - \underbrace{\left(\hat{\beta_0} + \hat{\beta_1}\text{elev}_i + \hat{f}(t)\right)}_{\text{mean function } \hat{\mu}(i, t)} \]

  2. Forecast \(\hat{e}_{i, t} = \mathbb{E}\left(e_{i, t}|e_{i, t - 1}\right)\) using the residual model

  3. “Feed forward” residual forecasts to obtain temperature forecasts

    \[ \hat{Y}_{i, t} = \hat{\mu}(i, t) + \hat{e}_{i, t} \]