Don't follow trends, start trends.
Frank Capra
I have to tell you about the Kalman filter, because what it does is pretty damn amazing.
Tim Babb
In a previous post, I looked at an application of a Kalman filter for simple projectile motion. In this post, I'll explore a different use of the Kalman filter, analyzing the trend in time series data. A trend is a long-term increase, decrease, or stationary period in the data. It may be linear or nonlinear.
Consider this data from the NASA Global Climate Change website.
The points represent the change in global surface temperature relative to 1951-1980 average temperatures. Seventeen of the 18 warmest years in the 136-year record all have occurred since 2001, with the exception of 1998. As you can see, there is a definite upward trend in the data that accelerates somewhere 1980.
Consider this data from the NASA Global Climate Change website.
The points represent the change in global surface temperature relative to 1951-1980 average temperatures. Seventeen of the 18 warmest years in the 136-year record all have occurred since 2001, with the exception of 1998. As you can see, there is a definite upward trend in the data that accelerates somewhere 1980.
The Local Trend Model
The local trend Kalman filter comes from Ruey S. Tsay's book Analysis of Financial Time Series. It is also available online as lecture notes for Tsay's Business 4191 class at .the Univ. of Chicago school of business.
We will model the time series data $\{ {y_t}\}$ with a simple model
\[{y_t} = {\mu _t} + {\varepsilon _t}\]
\[{\mu _{t+1}} = {\mu _t} + {\eta _t}\]
${\varepsilon _t} \sim N(0,{\sigma _\varepsilon })$ and ${\eta _t} \sim N(0,{\sigma _\eta })$ are Gaussian white noise. ${\mu _t}$ is the local mean or trend of the data. We can observe the data $\{{y_t}\}$, but not ${\mu _t}$. We're going to assume the noise variances ${\sigma _\varepsilon }$ and ${\sigma _\eta }$ are constant, but they don't have to be.
This model says that the data is described by random variations around a trend. One way to think about the data shown above is that there is some continuous process, global climate change, represented by ${\mu _t}$ that emits some data with noise. There are good models for climate change. We could use a climate model ${\mu _t} = f(t,\theta )$ where ${\theta}$ represents a whole bunch of parameters and forcing functions, and fit the data to the trend. Unfortunately, climate models are complicated. Instead, we'll keep things simple and use statistical inference and a Kalman filter to model the trend.
I won't go through the full derivation of the local trend Kalman filter. It can be found in Tsay's book and lecture notes and Durbin and Koopman's Time Series Analysis by State Space Methods. However, let's define a few terms.
${\mu _{t|t - 1}} = E({\mu _t}|{y_1}...{y_{t - 1}})$ is the expected value of ${\mu_t}$ given the data from the beginning of the series up to the preceding data value. Similarly, ${y_{t|t - 1}} = E({y_t}|{y_1}...{y_{t - 1}})$ is the expectation value of y given the preceding data. What value do we expect for ${y_t}$? We expect it to be ${\mu_t}$. So, ${y_{t|t - 1}} = {\mu_{t|t-1}}$
We will define some more quantities.
\[{v_t} = {y_t} - {\mu _{t|t - 1}}\]
\[{V_t} = {\mathop{\rm var}} ({v_t}|{y_1}...{y_{t - 1}}) = {\mathop{\rm var}} ({\mu _t} + {\varepsilon _t} - {\mu _{t|t - 1}}|{y_1}...{y_{t - 1}}) = {\Sigma _{t|t - 1}} + \sigma _\varepsilon ^2\]
We now define our model.
\[{\Sigma _{t|t - 1}} = \operatorname{var} ({\mu _t} - {\mu _{t|t - 1}}|{y_1}...{y_{t - 1}})\]
We now define our model.
\[\left[ {\begin{array}{*{20}{c}}
{{\mu _t}}\\
{{v_t}}
\end{array}} \right] \sim N\left( {\left[ {\begin{array}{*{20}{c}}
{{\mu _{t|t - 1}}}\\
0
\end{array}} \right],\left[ {\begin{array}{*{20}{c}}
{{\Sigma _{t|t - 1}}}&{{\Sigma _{t|t - 1}}}\\
{{\Sigma _{t|t - 1}}}&{{V_t}}
\end{array}} \right]} \right)\]
The Local Trend Kalman Filter
Using the definitions above, it is possible to derive a simple Kalman filter.
\[\begin{array}{l} {v_t} = {y_t} - {\mu _{t|t - 1}}\\
{V_t} = {\Sigma _{t|t - 1}} + \sigma _\varepsilon ^2\\
{K_t} = \frac{{{\Sigma _{t|t - 1}}}}{{{V_t}}}\\
{\mu _{t + 1|t}} = {\mu _{t|t - 1}} + {K_t}{v_t}\\
{\Sigma _{t + 1|t}} = {\Sigma _{t|t - 1}}(1 - {K_t}) + \sigma _\eta ^2
\end{array}\]
${v_t}$ is the one-step ahead error estimate, also called the residual. ${V_t}$ is the estimate of the variance of ${v_t}$. ${K_t}$ is the Kalman gain.
${v_t}$ is the one-step ahead error estimate, also called the residual. ${V_t}$ is the estimate of the variance of ${v_t}$. ${K_t}$ is the Kalman gain.
That looks pretty straight forward. Here is the code
class LocalTrendKalmanFilter(object):
def __init__(self, y_data, obs_var = 0.1,
state_var = 0.1, debug = False):
"""
Initialization
arguments:
y_data - numpy arrays of timeseries data
obs_var - variance of observed data
state_var - variance of mu, the state
debug - if true, prints values at each step
"""
self._ydata = y_data # observations
self._obs_var = obs_var # data variance
self._state_var = state_var # model variance
self._step = 1 # what time step are we on?
self._debug = debug
# state
self._mu = y_data[0]
self._v = 0
self._cov = self._obs_var
self._V = 0
self._K = 0
# some getters for state and covariance
@property
def mu(self):
return self._mu
@property
def v(self):
return self._v
@property
def cov(self):
return self._cov
def predict(self):
"""
predict - predict new state
"""
# are we at the end of the data?
if self._step == len(self._ydata):
self._mu = None
self._v = None
return
# update error prediction and variance
self._v = self._ydata[self._step] - self._mu
self._V = self._cov + self._obs_var
def update(self):
"""
update - Kalman update
"""
self._K = self._cov / self._V # Kalman gain
self._mu = self._mu + self._K * self._v # new state
self._cov = self._cov * (1 - self._K) + self._state_var # new variance
if self._debug:
print('y =', self._ydata[self._step], 'mu =', self._mu, 'v =', self._v,
'V =', self._V, 'K =', self._K, 'cov =', self._cov, '\n')
self._step += 1
Here is the result of applying the filter to the temperature anomaly data with ${\sigma _\varepsilon } = 1.0$ and ${\sigma _\eta } = 0.1$.
The filter is somewhat sensitive to the estimates of the variance. Watch what happens with ${\sigma _\varepsilon } = 0.1$ and ${\sigma _\eta } = 0.5$.
The fit appears a bit jumpier, although now the error mean stays closer to 0.
In an extreme case, ${\sigma _\varepsilon } = 0.000001$ and ${\sigma _\eta } = 1$, the fiter simply overfits.
The code and data can be downloaded from here.
No comments:
Post a Comment