Viterbi Training Part 4 - Why Doesn't This Work?

We saw previously that Viterbi training can predict the parameters of a simple hidden Markov model (hmm). The toy example that we examined had two fairly distinct states. The emissions were drawn from two normal distributions with well separated means. Let's see how the Viterbi training method does when the states are not so different.

Here are the parameters that were used to generate the new data.

{
    "states": 2,

    "transition": [
    {
        "from": 0,
        "to":   0,
        "prob": 0.9
    },
    {
        "from": 0,
        "to":   1,
        "prob": 0.1
    },
    {
        "from": 1,
        "to":   1,
        "prob": 0.95
    },
    {
        "from": 1,
        "to":   0,
        "prob": 0.05
    }
    ],
    
    "emission": [
    {
        "state": 0,
        "mean": 0,
        "std": 1
    },
    {
        "state": 1,
        "mean": 0,
        "std": 2
    }
    ]
}


The emission states have the same mean but slightly different standard deviations.

Running the Viterbi training software produces these predictions for the parameters.

The transition parameters are


State 1
State 2
State 1
0.25
0.75
State 2
0.003003
0.996997

and the emission parameters are


Mean
Standard Deviation
State 1
7.267712
0.036293
State 2
0.05060602
1.77123468

That's not so great. Examining the predicted states shows that state 2 was predicted for each position, which means that the bizarre predictions for state 1 emission and transition parameters are irrelevant because the program isn't finding any results for state 1.

Incidentally, the Baum-Welch algorithm does a somewhat better job on this data.

The transition parameters


State 1
State 2
State 1
0.8039116
0.1960884
State 2
0.3606936
0.6393064

and the emission parameters


Mean
Standard Deviation
State 1
-0.08971582
1.344166
State 2
0.35073713
2.397022

What's going on? :Let's take a closer look at the data, maybe that will reveal something. Here's a line plot of the data generated with the original parameters.

It looks like there are areas of higher variance, but the regions don't look distinct to my eyes.

Let's look at the data distribution to see if it reveals any distinction between states.

The histogram is the distribution of data values, displayed as a density. The solid line is a plot of a normal distribution with a mean equal to the overall mean of the data (0.06504024) and a standard deviation equal to the overall standard deviation of the data (1.799501). The fit isn't perfect. The data distribution is bit "fat-tailed". A qq-plot shows that the data is "normal-like", but the deviations at the ends of the plot show the  effect of the fat tails.

Look at this way, maybe this data is better considered as single approximately normal distribution rather than data consider to be from two different state processes. Considered this way, the mean and standard deviation predicted by Viterbi training for state 2 is not so unreasonable.

If there is a moral to this story, it's that doing some exploratory data analysis before, rather than after,  applying an algorithm is a good idea. Viterbi training is often a "good-enough" algorithm when the HMM states are sufficiently distinct.

No comments:

Post a Comment