Kalman Filter and Change Points, Part 3

The first step toward change is awareness. The second step is acceptance. 
~ Nathaniel Branden

There is nothing permanent except change.
~ Heraclitus

I previously looked at the problem of segmenting a time series. Segmenting involves discovering points in the series where there is change. One question arises immediately; when what changes? 

In a previous post, we tried to segment the series based on changes to the parameters of a probability distribution or on discovering outliers in the trend discovered by a Kalman filter.

Bayesian Change Point Again

Here's the data from the previous post. This is temperature anomaly data from the  NASA Global Climate Change site.


We could think of this data as a series of linear segments. Each segment has a mean. The data in each segment is scattered around the linear line segments. Within each segment, we assume the mean is constant. The mean is described by an intercept and slope. The data points are scattered around this mean in a normal distribution. Within a segment, the mean of the normal distribution is zero. The algorithm assumes the variance across the segments is constant. Mean of the segement and the variances of the random points around the segement mean differ among segments.

In other words, the model is

\[\begin{array}{l}{y_s} \sim {\alpha _s} + {\beta _s}{x_{\{ s\} }} + {\varepsilon _s}\\{\varepsilon _s} \sim N(0,\sigma _s^2)\end{array}\]

s is the segment number. If we assume that ${\alpha }$ and ${\beta }$ are also normally distributed, we could use an MCMC technique such as Gibbs sampling to get estimates of the posterior mean and variance at each point. We could also get the probability of a change point at each point. 

Luckily, we don't have to do much work to accomplish this. There are a number of software packages that will do the heavy work of sampling. I chose R's bcp 4.0 package to estimate the mean and variance at each point and the probability of change points. The bcp help file lists a number of references describing the underlying techniques.

Here's same code to fit the data. Most of the lines are just for plotting.

There is nothing permanent except change. Share this Quote Heraclitus


Read more at https://www.brainyquote.com/topics/change-quotes
##' plot_bcp
#'    Bayesian change point analysis of temperature anomaly data. 
#'
#' @param df - a data frame containing columns Year and Temperature_Anomaly
#' @param burnin - an int, the number of burnin iterations
#' @param mcmc - an int, the number of iterations after burnin.
#' @param p0 - a numeric between 0 and 1. prior on change point probabilities, U(0, p0)
#' @param d - a positive number, see bcp docs.
#'
#' @return - a list
#' df - a data frame with columns: Year, Temperature_Anomaly, Posterior_Probability, Posterior_Mean, Posterior_Variance
#' bcp - the output from the bcp function
#' params - a list, the input parameters
#' 
plot_bcp <- function(df,
                     burnin = 1000,
                     mcmc = 10000,
                     p0 = 0.2,
                     d = 10) {
  require(bcp)
  require(tidyverse)
  require(ggpubr)
  
  # run bcp to fit the model
  bcp_out <- bcp(df$Temperature_Anomaly, df$Year, 
                 d = d,
                 burnin = burnin,
                 mcmc = mcmc,
                 p0 = p0)
  
  # the rest is juts plotting
  df2 <- data.frame(Year = df$Year,
                    Temperature_Anomaly = df$Temperature_Anomaly,
                    Posterior_Probability = bcp_out$posterior.prob,
                    Posterior_Mean = bcp_out$posterior.mean[, 1],
                    Posterior_Variance = bcp_out$posterior.var[, 1])
  
  p1 <- ggplot(df2) +
    geom_point(aes(x = Year, y = Temperature_Anomaly, color = 'Temperature Anomaly')) +
    geom_line(aes(x = Year, y = Posterior_Mean, color = 'BCP Posterior Mean')) +
    geom_ribbon(aes(x = Year, 
                    y = Posterior_Mean,
                    ymin = Posterior_Mean - sqrt(Posterior_Variance), 
                    ymax = Posterior_Mean + sqrt(Posterior_Variance),
                    color = '+/- SD'),
                fill = 'grey',
                alpha = 0.4) +
    scale_colour_manual(name = '',
                        values = c('grey', 'seagreen', 'brown')) +
    theme(legend.position="bottom")
  
  p2 <- ggplot(df2) +
    geom_linerange(aes(x = Year, ymin = 0, ymax = Posterior_Probability)) +
    ylab('Probability of Change Point')
  
  p <- ggarrange(plotlist = list(p1, p2),
                 ncol = 1,
                 nrow = 2)
  
  print(p)
  
  return(list(df = df2, 
              bcp = bcp_out,
              params = list(burnin = 50,
                            mcmc = 500,
                            p0 = 10,
                            d = 10)))
}

Here's what the results look like.


The segments appear linear, with some wiggle. The change points show up around where your eye expects them, but there is considerable uncertainty.

Segmentation and The Kalman Filter

We can think of the data shown above as being generated by some underlying process. The process, the atmosphere, could be modeled by a complicated climate model. However, we can have seen that the data can be fit reasonably well by a simple Kalman Filter. If we take the output from kf_gibbs_t.R, described in this post, we can use bcp to sample change points in the underlying trend.

temp_fit2 <- kf_gibbs_t(df)
df2 <- temp_fit1$df %>%
    select(Year, Posterior_Mean) %>%
    rename(Temperature_Anomaly = Posterior_Mean)
temp_fit_bcp2 <- plot_bcp(df2, title = 'kf_gibbs')

The bcp output shows smaller variance as would be expected. bcp predicts more change points but with more confidence.

No comments:

Post a Comment