Hi everyone. I’ve been trying to build an implementation of a standard Hidden Markov Model in Edward, essentially trying to replicate this example in Stan: Stan HMM Semi-Sup, and inspired by this post here: Gelman’s Blog
The idea is that you simply marginalize out all the latent states using the forward algorithm, which is a dynamic programming algo that yields the likelihood of the data, given the model parameters. However, I’m quite new to Edward and I’m struggling to understand how exactly to structure the inference part of the model.
Consider a very simple HMM with 2 Gaussian emission states with unknown means and variances, and a known & fixed transmission matrix. You have observations of data, and you want to estimate the distributions of the unknown parameters, given your priors, and the data.
I’ve written a custom Edward distribution that that takes the parameters of the emission distributions, computes the emission lattice (the matrix that contains the log-likelihood of each observation, given your emission distributions), and then performs the forward pass to yield an overall HMM log-likelihood. But I’m stuck on how to wrap it all together now!
I’ve verified that individual parts of the code do the right thing by their inclusion in other tools, but there’s still probably bugs here, and I’m a bit hazy on the Edward syntax so if you see something strange, it’s probably because I’ve just copied it from an example…
Code:
from edward.models import RandomVariable
from tensorflow.contrib.distributions import Distribution
from tensorflow.contrib.distributions import FULLY_REPARAMETERIZED
class HMM(RandomVariable, Distribution):
def __init__(self, transmat, init_p, states, name='', *args, **kwargs):
self.transmat = transmat
self.init_p = init_p
self.states = states
parameters = locals()
parameters.pop("self")
super(HMM, self).__init__(dtype=tf.float32,
reparameterization_type=FULLY_REPARAMETERIZED,
validate_args=False,
allow_nan_stats=False,
parameters=parameters,
name=name,*args, **kwargs)
def _log_prob(self, value):
#first we need to compute the (len(obs), len(states)) matrix that holds the
#likelihood of each observation coming from each state
B_list = []
for j in range(0, self.transmat.shape[1]):
state_probs = self.states[j].log_prob(value)
B_list.append(state_probs)
B = tf.squeeze(tf.stack(B_list,axis=1))
#now we compute the forward pass to yield the log-likelihood of the data given the model. This is done in
#log-space to avoid overflow/underflow
alpha_0 = self.init_p + B[0]
logprob = tf.scan(lambda alpha, b: tf.reduce_logsumexp(tf.transpose(alpha) + self.transmat, axis=0, keepdims=True) + b,
initializer=alpha_0,
elems=B[1:, :],
back_prop=True,
infer_shape=False)
return tf.reduce_logsumexp(logprob[-1])
def _sample_n(self, n, seed=None):
raise NotImplementedError("_sample_n is not implemented")
Now the part I’m very confused about:
n_emissions = 2
tf.reset_default_graph()
A = tf.Variable(np.log(A_), name="transmat",dtype=tf.float32, trainable=False)
B = tf.Variable(tf.zeros((N, A_.shape[1]), dtype=tf.float32), name="fwd_obs_mat",dtype=tf.float32, trainable=False)
X = tf.placeholder(shape=O.shape, name="data", dtype=tf.float32)
p_init = tf.Variable(np.log(initial_probabilities), name='p_init', dtype=tf.float32, trainable=False)
states = []
s1 = Normal(loc=tf.zeros(1), scale=tf.ones(1))
s2 = Normal(loc=tf.zeros(1), scale=tf.ones(1))
states.append(s1)
states.append(s2)
qw = Normal(loc=tf.get_variable("qw/loc", [1]),
scale=tf.nn.softplus(tf.get_variable("qw/scale", [1])))
qb = Normal(loc=tf.get_variable("qb/loc", [1]),
scale=tf.nn.softplus(tf.get_variable("qb/scale", [1])))
loglikelihood = HMM(A_, initial_probabilities,states, value=tf.zeros_like([1], dtype=tf.float32))
with tf.Session() as sess:
init = tf.global_variables_initializer()
inference = #what goes here?
inference.run(n_samples=5, n_iter=250)
How do I complete this example? In Pymc3 I can wrap the function that returns the HMM log-likelhood in a pm.DensityDist or pm.Potential distribution, and then basically treat it like as normal, but I’m very confused as to how to proceed here. For what it’s worth, I’ve managed to create a fully functional version of this code in pymc3/theano, so I’m more or less sure it’s on the right track! Any feedback, including ‘nothing you’re doing makes any sense’, is welcome here.
Cheers!