It seems the code in the mixture example (http://edwardlib.org/tutorials/unsupervised), which estimates the cluster indexes, is incomplete. The code in question is this:
mu_sample = qmu.sample(100)
sigmasq_sample = qsigmasq.sample(100)
x_post = Normal(loc=tf.ones([N, 1, 1, 1]) * mu_sample,
scale=tf.ones([N, 1, 1, 1]) * tf.sqrt(sigmasq_sample))
x_broadcasted = tf.tile(tf.reshape(x_train, [N, 1, 1, D]), [1, 100, K, 1])
log_liks = x_post.log_prob(x_broadcasted)
log_liks = tf.reduce_sum(log_liks, 3)
log_liks = tf.reduce_mean(log_liks, 1)
clusters = tf.argmax(log_liks, 1).eval()
The code is essentially implementing the formulation (eq 1):
According to this, the posterior prediction of the cluster index is determined by the Monte Carlo approximation of the posterior mean class-conditional likelihoods.
However, this ignores the posterior information flow from data X to mixture weights.
The correct formulation scales the above by the Monte Carlo approximation of the posterior mean mixture weight for cluster k, as show in (eq 11) above.
My question is why the tutorial contains code that ignores this potentially important factor in the posterior estimation of cluster indexes? If it is removed for simplicity, then it should be mentioned as an approximation.