I spend most of my life feeling like I've been shot out of a cannon
I spend most of my life feeling like I've been shot out of a cannon. - Molly Ivins
The Kalman Filter
I have been using this blog to post techniques that I have been trying to learn. I hope eventually to apply these to more serious data than the examples I have been using. I'm going to continue and take a look at a very simple application of a Kalman filter. The Kalman filter allows us to use noisy measurements of a dynamic process to make reasonably accurate predictions of the process's state. Kalman filters are important for guidance, navigation, and control of vehicles and robots, aircraft and spacecraft. It is also widely used in time series analysis, signal processing, and econometrics.
If you Google Kalman filter, you will find many web pages describing the method in varying levels of mathematical difficulty. My favorite source for Kalman filter details is Kalman and Bayesian Filters in Python by Roger Labbe. The book's website provides an Juypter notebook and supporting software.
I'm not going to go into the derivation of the filter. Instead I'm going to look at a simple application, the motion of a projectile launched with an initial velocity and negligible air resistance.
Kalman filters rely on measurements taken at repeated periods. They are recursive in the sense that the prediction of the future state depends on the present state and the measurement. The state represents the quantities that are important for the dynamics of the system. In a projectile example, these quantities could be things like position, velocity, and acceleration. The filter also includes control information such as rudder position. The state is represented as a vector of real numbers. At each time increment, a linear operator is applied to the state to generate the new state. The prediction of the new state includes noise, and optionally some information from the controls on the system. Then, another linear operator mixed with more noise generates the observed outputs from the true state. To build the filter requires:
- A mathematical model of the system.
- An estimate of the initial state of the system.
- An estimate of the error inherent in the model.
- Estimates of the process and measurement error.
The Kalman Filter Equations
The Kalman filter consists of two stages, a prediction step and an update step. The fitter iterates between these steps. In essence, the Kalman filter is a predictor-corrector model. The prediction step estimates a new state vector and the system noise covariance based on the current state, covariance and control input. The update step takes system measurements and the predictions and uses them to calculate a corrected state and covariance.
Prediction
The prediction step predicts the state and covariance, i.e. how much error, at each time step. The filter model assumes the true state at time k is evolved from the state at (k − 1) according to
\[{{\mathbf{x}}_k} = {\mathbf{F}}{{\mathbf{x}}_{k - 1}} + {\mathbf{B}}{{\mathbf{u}}_k}\]
- ${\mathbf{x}}$ is the state vector.
- ${\mathbf{u}}$ is called the control vector. It represents control input to the system.
- ${\mathbf{F}}$ is called the state transition matrix. It is applied to the previous state to obtain an updated state vector.
- ${\mathbf{B}}$ is the control model. It is applied to the control vector to produce the control effect on the state.
\[{{\mathbf{P}}_k} = {\mathbf{F}}{{\mathbf{P}}_{k - 1}}{{\mathbf{F}}^T} + {\mathbf{Q}}\]
- ${\mathbf{P}}$ is the covariance matrix. It is the estimate of the error for each part of the system. ${{\mathbf{F}}^T}$ is the transpose of the state transition matrix.
- ${\mathbf{Q}}$ is the estimated process error covariance. Getting a good model for this matrix can be difficult.
Update
The update step consists of a series of linear equations to produce an updated value for ${\mathbf{x}}$ and ${\mathbf{P}}$.
\[{\mathbf{\tilde y}} = {{\mathbf{z}}_k} - {\mathbf{H}}{{\mathbf{x}}_k}\]
\[{\mathbf{S}} = {\mathbf{H}}{{\mathbf{P}}_k}{{\mathbf{H}}^T} + {\mathbf{R}}\]
\[{\mathbf{K}} = {{\mathbf{P}}_k}{{\mathbf{H}}^T}{{\mathbf{S}}^{ - 1}}\]
\[{{\mathbf{x}}_k} = {{\mathbf{x}}_k} + {\mathbf{K\tilde y}}\]
\[{{\mathbf{P}}_k} = ({\mathbf{I}} - {\mathbf{KH}}){{\mathbf{P}}_k}\]
- ${\mathbf{z}}$ is the input measurement vector. It contains measurements of the state components.
- ${\mathbf{H}}$ is the observation matrix. It translates the state vector to a measurement vector.
- ${\mathbf{\tilde y}}$ is called the innovation or residual. It compares the state prediction against the measurement.
- ${\mathbf{S}}$ is the residual covariance. It compares the real error against the prediction.
- ${\mathbf{R}}$ is the measurement covariance.
- ${\mathbf{K}}$ is the Kalman gain. ${{\mathbf{S}}^{ - 1}}$ is the matrix inverse of ${\mathbf{S}}$.
- ${\mathbf{I}}$ is the identity matrix.
A Filter Class
Given all of this linear algebra, it's pretty straight forward to write to write code for a filter. The tricky part is designing the matrices. Here's a Python class for a Kalman filter. It's not especially efficient. If you want efficiency, you can always use CUDA.
import numpy as np
"""
KalmanFilter - a class. This is pretty self-explanatory given some documentation of the Kalman Filter
Based on https://drive.google.com/file/d/0By_SW19c1BfhSVFzNHc0SjduNzg/view
"""
class KalmanFilter(object):
def __init__(self, x0, P, F, B, Q, H, R):
"""
Initialization
arguments:
x0 - initial state vector
"""
self._x = x0 # state vector
self._P = P # state covariance
self._F = F # state transition matrix
self._B = B # control matrix
self._Q = Q # process covariance
self._H = H # observation covariance
self._R = R # measurement covariance
# a couple of getters for state and covariance
@property
def x(self):
return self._x
@property
def P(self):
return self._P
def predict(self, u):
"""
predict - predict new state
calculates x = Fx + Bu; P = FPF.T + Q
arguments:
u - control vector
"""
self._x = np.dot(self._F, self._x) + np.dot(self._B, u)
self._P = np.dot(np.dot(self._F, self._P), self._F.T) + self._Q
def update(self, z):
"""
update - Kalman update
arguments:
z - measurement vector
"""
y = z - np.dot(self._H, self._x) # residual
S = np.dot(np.dot(self._H, self._P), self._H.T) + self._R # innovation covariance
K = np.dot(np.dot(self._P, self._H.T), np.linalg.inv(S)) # Kalman gain
self._x = self._x + np.dot(K, y) # new state
self._P = np.dot(np.identity(self._P.shape[0]) - np.dot(K, self._H), self._P) # new state covariance
Fire The Cannon
As a simple example of the use of the Kalman filter, we'll consider an example from elementary physics, simple projectile motion. An object, such as a cannon shell or an arrow, is launched with an initial velocity and travels under the influence of gravity until it hits the ground. We will ignore air resistance, Coriolis effects, and other forces that make this a tough problem in the real world.
Here are the equations of motion for the projectile as recurrence relations. Dt is the time step and g is gravitational acceleration.
\[{x_t} = {x_{t - 1}} + {v_{t - 1}}\Delta t\]
\[{v_{x,t}} = {v_{x,t - 1}}\]
\[{y_t} = {y_{t - 1}} + {v_{y,t - 1}}\Delta t - \tfrac{1}{2}g\Delta {t^2}\]
\[{v_{y,t}} = {v_{y,t - 1}} - g\Delta t\]
The left hand side of these equations is our state vector. The right hand side represents prediction equations transforming the old state and control into the new state. The only control for the projectile is g, the acceleration of gravity.
Rewriting
\[{{\mathbf{x}}_t} = {\mathbf{F}}{{\mathbf{x}}_{t - 1}} + {\mathbf{B}}{{\mathbf{u}}_t}\]
\[{{\mathbf{x}}_t} = \left[ {\begin{array}{*{20}{c}}
1&{\Delta t}&0&0 \\
0&1&0&0 \\
0&0&1&{\Delta t} \\
0&0&0&1
\end{array}} \right]{{\mathbf{x}}_{t - 1}} + \left[ {\begin{array}{*{20}{c}}
1&0&0&0 \\
0&1&0&0 \\
0&0&1&0 \\
0&0&0&1
\end{array}} \right]\left[ {\begin{array}{*{20}{c}}
0 \\
0 \\
{ - \tfrac{1}{2}g\Delta {t^2}} \\
{ - g\Delta t}
\end{array}} \right]\]
We can see that
\[{\mathbf{x}} = \left[ {\begin{array}{*{20}{c}}
x \\
{{v_x}} \\
y \\
{{v_y}}
\end{array}} \right]\]
\[{\mathbf{F}} = \left[ {\begin{array}{*{20}{c}}
1&{\Delta t}&0&0 \\
0&1&0&0 \\
0&0&1&{\Delta t} \\
0&0&0&1
\end{array}} \right]\]
\[{\mathbf{B}} = \left[ {\begin{array}{*{20}{c}}
1&0&0&0 \\
0&1&0&0 \\
0&0&1&0 \\
0&0&0&1
\end{array}} \right]\]
The control vector is a constant.
\[u = \left[ {\begin{array}{*{20}{c}}
0 \\
0 \\
{ - \tfrac{1}{2}g\Delta {t^2}} \\
{ - g\Delta t}
\end{array}} \right]\]
We need to supply a few other matrices to the filter.
${\mathbf{Q}}$ is the estimated process error covariance. Since the process is just elementary physics, we will set ${\mathbf{Q}}$ to a 4X4 matrix of zeros. ${\mathbf{H}}$ translates the state vector to a measurement vector. We assume our measurements are on the same scale as the projectile data, so ${\mathbf{H}}$ will be an identity matrix.
${\mathbf{R}}$, the measurement covariance. We will assume that the variance is the same for each element of the vector and there is no covariance. That is, the measurement of velocity doesn't affect the position measurement.. This probably a bad assumption, but for the purposes of example, it should be OK.
\[{\mathbf{R}} = \left[ {\begin{array}{*{20}{c}}
{{\sigma ^2}}&0&0&0 \\
0&{{\sigma ^2}}&0&0 \\
0&0&{{\sigma ^2}}&0 \\
0&0&0&{{\sigma ^2}}
\end{array}} \right]\]
To get started, we need an initial state vector and covariance. The initial state is
\[{{\mathbf{x}}_0} = \left[ {\begin{array}{*{20}{c}}
{{x_0}} \\
{\left| v \right|\cos (\theta )} \\
{{y_0}} \\
{\left| v \right|\sin (\theta )}
\end{array}} \right]\]
${\theta}$ is the initial launch angle and ${\left| v \right|}$ is the launch speed.
With all of this in mind, we can set up the Kalman filter.
Let's try out the filter and see how well the fitter matches the trajectory.
Here's the Newton's law version.
Here's the body of the projectile simulation.
The results are plotted below. The fit from the Kalman filter follows the trajectory closely.
What happens if we increase the measurement error? In the the plot above, the measurement error standard deviation was ${{\sigma^2} = 20}$. ]The scatter in the measurements looks relatively small on the scale of the plot. What happens if we increase the measurement error.
With ${{\sigma^2} = 200}$, the fit from the filter is still pretty reasonable given the high measurement error and large step time step size.
If we increase the measurement step size, the fit gets better, but the plot gets so crowded with data points that it's hard to see the fit.
The code is available here.
\[{{\mathbf{x}}_t} = \left[ {\begin{array}{*{20}{c}}
1&{\Delta t}&0&0 \\
0&1&0&0 \\
0&0&1&{\Delta t} \\
0&0&0&1
\end{array}} \right]{{\mathbf{x}}_{t - 1}} + \left[ {\begin{array}{*{20}{c}}
1&0&0&0 \\
0&1&0&0 \\
0&0&1&0 \\
0&0&0&1
\end{array}} \right]\left[ {\begin{array}{*{20}{c}}
0 \\
0 \\
{ - \tfrac{1}{2}g\Delta {t^2}} \\
{ - g\Delta t}
\end{array}} \right]\]
We can see that
\[{\mathbf{x}} = \left[ {\begin{array}{*{20}{c}}
x \\
{{v_x}} \\
y \\
{{v_y}}
\end{array}} \right]\]
\[{\mathbf{F}} = \left[ {\begin{array}{*{20}{c}}
1&{\Delta t}&0&0 \\
0&1&0&0 \\
0&0&1&{\Delta t} \\
0&0&0&1
\end{array}} \right]\]
\[{\mathbf{B}} = \left[ {\begin{array}{*{20}{c}}
1&0&0&0 \\
0&1&0&0 \\
0&0&1&0 \\
0&0&0&1
\end{array}} \right]\]
The control vector is a constant.
\[u = \left[ {\begin{array}{*{20}{c}}
0 \\
0 \\
{ - \tfrac{1}{2}g\Delta {t^2}} \\
{ - g\Delta t}
\end{array}} \right]\]
We need to supply a few other matrices to the filter.
${\mathbf{Q}}$ is the estimated process error covariance. Since the process is just elementary physics, we will set ${\mathbf{Q}}$ to a 4X4 matrix of zeros. ${\mathbf{H}}$ translates the state vector to a measurement vector. We assume our measurements are on the same scale as the projectile data, so ${\mathbf{H}}$ will be an identity matrix.
${\mathbf{R}}$, the measurement covariance. We will assume that the variance is the same for each element of the vector and there is no covariance. That is, the measurement of velocity doesn't affect the position measurement.. This probably a bad assumption, but for the purposes of example, it should be OK.
\[{\mathbf{R}} = \left[ {\begin{array}{*{20}{c}}
{{\sigma ^2}}&0&0&0 \\
0&{{\sigma ^2}}&0&0 \\
0&0&{{\sigma ^2}}&0 \\
0&0&0&{{\sigma ^2}}
\end{array}} \right]\]
To get started, we need an initial state vector and covariance. The initial state is
\[{{\mathbf{x}}_0} = \left[ {\begin{array}{*{20}{c}}
{{x_0}} \\
{\left| v \right|\cos (\theta )} \\
{{y_0}} \\
{\left| v \right|\sin (\theta )}
\end{array}} \right]\]
${\theta}$ is the initial launch angle and ${\left| v \right|}$ is the launch speed.
With all of this in mind, we can set up the Kalman filter.
def init_projectile(initial_position, initial_speed, initial_angle, measurement_cov, dt):
"""
init_projectile - get the cannon ready to fire
arguments:
initial_position - tuple (initial x, initial y) positions
initial_speed - launch speed in m/sec
initial_angle - launch angle in degrees
measurment_cov - measurement covarinace. We are assuming this doesn't change during flight
dt - measurement time step
returns:
kf - a Kalman Filter object
F - state transition matrix
B - control matrix
Note: see https://drive.google.com/file/d/0By_SW19c1BfhSVFzNHc0SjduNzg/view to get an
idea of what all these matrices mean
state vector is np.array([x_pos, vx, y_pos, vy]).T
"""
# set up initial state vector
x = np.array([[initial_position[0], initial_speed * np.cos(np.radians(initial_angle)),
initial_position[1], initial_speed * np.sin(np.radians(initial_angle))]]).T
# state transition matrix
F = np.identity(4)
F[0,1]=dt
F[2,3]=dt
# control matrix
B = np.zeros((4,4))
B[2,2] = 1
B[3,3] = 1
# measurement covarinace
R = measurement_cov # measurement covarinace
H = np.identity(4) # observation matrix Hx is a measurement vector
P = np.identity(4) # initial state covariance
Q = np.zeros((4,4)) # error covariance - we're assuming it's null
kf = KalmanFilter(x, P, F, B, Q, H, R) # create the filter
return kf, F, B
Let's try out the filter and see how well the fitter matches the trajectory.
Here's the Newton's law version.
def update_flight(F, B, u, x):
"""
update_flight - update the state for a flight governed by Netwon's laws and no air
resistance.
Calculates x_new = Fx_old + Bu
arguments:
see filter.py
returns:
the new state vector. In this case a (4,1) numpy array
"""
return np.dot(F, x) + np.dot(B, u)
Here's the body of the projectile simulation.
def main():
num_obs = 200
dt = 0.1
measurement_cov = np.identity(4) * 20
real_initial_pos = (0, 0)
test_initial_pos = (0, 0)
initial_speed = 100
initial_angle = 45
# initialize
kf, F, B = init_projectile(test_initial_pos, initial_speed, initial_angle, measurement_cov, dt)
# control - the only control is gravity
g = -9.81
u = np.array([[0, 0, (1/2) * g * dt**2, g * dt]]).T
x = kf.x # inital state
# some lists for plotting
x_flight = [np.array([[real_initial_pos[0], initial_speed * np.cos(np.radians(initial_angle)),
real_initial_pos[1], initial_speed * np.sin(np.radians(initial_angle))]]).T]
z_list = [np.random.multivariate_normal(x_flight[-1].flatten(), measurement_cov)]
x_list = [kf.x]
while True:
# update flight and measurement. Assume flight is perfectly governed by Newton's laws
x_f = update_flight(F, B, u, x_flight[-1])
if x_f[2,0] <= 0: # don't go underground
break;
z = np.array([np.random.multivariate_normal(x_f.flatten(), measurement_cov)]).T
x_flight.append(x_f)
z_list.append(z)
# the real work - predict new state and then update prediction based on measurement
kf.predict(u)
kf.update(z)
x_list.append(kf.x)
# extract some saved data for plotting
flight_xs = [x_flight[i][0][0] for i in range(len(x_flight))]
flight_ys = [x_flight[i][2][0] for i in range(len(x_flight))]
z_xs = [z_list[i][0] for i in range(len(z_list))]
z_ys = [z_list[i][2] for i in range(len(z_list))]
xs = [x_list[i][0][0] for i in range(len(x_list))]
ys = [x_list[i][2][0] for i in range(len(x_list))]
# plot it
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(flight_xs, flight_ys, label='Flight')
ax.plot(z_xs, z_ys, 'ro', label='Measurements')
ax.plot(xs, ys, label='Fit')
plt.xlabel('x')
plt.ylabel('y')
ax.legend()
plt.show()
The results are plotted below. The fit from the Kalman filter follows the trajectory closely.
What happens if we increase the measurement error? In the the plot above, the measurement error standard deviation was ${{\sigma^2} = 20}$. ]The scatter in the measurements looks relatively small on the scale of the plot. What happens if we increase the measurement error.
With ${{\sigma^2} = 200}$, the fit from the filter is still pretty reasonable given the high measurement error and large step time step size.
If we increase the measurement step size, the fit gets better, but the plot gets so crowded with data points that it's hard to see the fit.
The code is available here.
No comments:
Post a Comment