Here are the parameters that were used to generate the new data.
{
"states": 2,
"transition": [
{
"from": 0,
"to": 0,
"prob": 0.9
},
{
"from": 0,
"to": 1,
"prob": 0.1
},
{
"from": 1,
"to": 1,
"prob": 0.95
},
{
"from": 1,
"to": 0,
"prob": 0.05
}
],
"emission": [
{
"state": 0,
"mean": 0,
"std": 1
},
{
"state": 1,
"mean": 0,
"std": 2
}
]
}
The emission states have the same mean but slightly different standard deviations.
Running the Viterbi training software produces these predictions for the parameters.
The transition parameters are
State 1
|
State 2
|
|
State 1
|
0.25
|
0.75
|
State 2
|
0.003003
|
0.996997
|
and the emission parameters are
Mean
|
Standard Deviation
|
|
State 1
|
7.267712
|
0.036293
|
State 2
|
0.05060602
|
1.77123468
|
That's not so great. Examining the predicted states shows that state 2 was predicted for each position, which means that the bizarre predictions for state 1 emission and transition parameters are irrelevant because the program isn't finding any results for state 1.
Incidentally, the Baum-Welch algorithm does a somewhat better job on this data.
The transition parameters
State 1
|
State 2
|
|
State 1
|
0.8039116
|
0.1960884
|
State 2
|
0.3606936
|
0.6393064
|
and the emission parameters
Mean
|
Standard Deviation
|
|
State 1
|
-0.08971582
|
1.344166
|
State 2
|
0.35073713
|
2.397022
|
What's going on? :Let's take a closer look at the data, maybe that will reveal something. Here's a line plot of the data generated with the original parameters.
It looks like there are areas of higher variance, but the regions don't look distinct to my eyes.
Let's look at the data distribution to see if it reveals any distinction between states.
The histogram is the distribution of data values, displayed as a density. The solid line is a plot of a normal distribution with a mean equal to the overall mean of the data (0.06504024) and a standard deviation equal to the overall standard deviation of the data (1.799501). The fit isn't perfect. The data distribution is bit "fat-tailed". A qq-plot shows that the data is "normal-like", but the deviations at the ends of the plot show the effect of the fat tails.
Look at this way, maybe this data is better considered as single approximately normal distribution rather than data consider to be from two different state processes. Considered this way, the mean and standard deviation predicted by Viterbi training for state 2 is not so unreasonable.
If there is a moral to this story, it's that doing some exploratory data analysis before, rather than after, applying an algorithm is a good idea. Viterbi training is often a "good-enough" algorithm when the HMM states are sufficiently distinct.
No comments:
Post a Comment