Annual cycles
The seasonality is annual – let’s examine the annual cycle instead of the usual time course plot.
- Can you see the start and stop dates in the plot?
- Any other observations?
Pooling sites
How would you estimate the annual cycle based on data at each site?
Daily average temperatures at 0.2m depth for 37 sites, 2017-2019.
As an estimation problem
Modeling the trend can be formulated as estimating the model:
\[
Y_{i, t} = f(t) + \epsilon_{i, t}
\]
Where:
\(Y_{i, t}\) is the temperature at site \(i\) and time \(t\)
\(f(t)\) is the mean temperature at time \(t\)
\(\epsilon_{i, t}\) is a random error
But how do you estimate an arbitrary function?
Basis functions
A basis function is an element of a basis for a function space.
If \(\{f_j\}\) form a basis for a function space \(C\) then
\[
f \in C \quad\Longleftrightarrow f = \sum_j c_j f_j
\]
A finite subset of basis functions can be used to approximate functions in the space:
\[
f \approx \sum_{j = 1}^J c_j f_j
\]
Basis approximation
A nifty trick is to estimate \(f\) using a suitable basis approximation:
\[
Y_{i, t} = \beta_0 + \color{maroon}{\underbrace{\sum_{j = 1}^J \beta_j f_j(t)}_{\tilde{f}(t) \approx f(t)}} + \epsilon_{i, t}
\]
This model can be fit using standard linear regression. (Think of the \(f_j(t)\)’s as \(J\) ‘new’ predictors.)
Spline basis
The spline basis is a basis for piecewise polynomials of a specified order.
Bases for piecewise polynomials of order 1 through 4 joined at evenly-spaced knot points.
Generated recursively based on ‘knots’ – joining locations
Knot spacing
Knot spacing will affect how densely basis functions are concentrated around particular regions of data.
Here are bases generated on some unevenly-spaced knots:
Check your understanding: where would this spline basis have the most flexible approximation capability?
Knot placement
Appropriate placement of knots is essential for quality function approximation.
Where would you put them for our data?
A first attempt: spline basis
Model: \(Y_{i,t} = \beta_0 + \beta_1\cdot\text{elev}_i + \sum_{j = 1}^7 \gamma_j \cdot f_j(t) + \epsilon_{i, t}\)
A problem
The choice of basis must match problem context.
here, need boundaries to meet
in other words, need a harmonic function
Fourier basis
The Fourier basis is a basis for square-integrable functions on closed intervals consisting of sine-cosine pairs.
Forecasting
Does this forecast make sense? Why or why not?
Next time
Fit a time series model to the residuals
\[
e_{i, t} = Y_{i, t} - \underbrace{\left(\hat{\beta_0} + \hat{\beta_1}\text{elev}_i + \hat{f}(t)\right)}_{\text{mean function } \hat{\mu}(i, t)}
\]
Forecast \(\hat{e}_{i, t} = \mathbb{E}\left(e_{i, t}|e_{i, t - 1}\right)\) using the residual model
“Feed forward” residual forecasts to obtain temperature forecasts
\[
\hat{Y}_{i, t} = \hat{\mu}(i, t) + \hat{e}_{i, t}
\]