I am trying to implement this model in Edward to add it to the library (Deep Unsupervised Clustering With Gaussian Mixture VAE). I actually found RuiShu’s blog that seemed to raise a nice point so I copied the “True GMM” model right into edward. Original code is here https://github.com/RuiShu/vae-clustering/blob/master/gmvae.py

I have the vanilla autoencoder working, this code seems to work in the sense that when I visualize latent space and logits, it seems to do something reasonable on simulated datasets. However, the learning is unstable and I sometimes have the following error:

InvalidArgumentError: Received a label value of 2 which is outside the valid range of [0, 2).

How come the output of a Dirichlet [0, 2) can be 2 ? is there some masked NaN ? Moreover, training loss does not stabilize. I’d like to add a nice notebook to the library on this when over.

Just re-read the papers again, it seems that I would have to rewrite an new class of inference if I want a loss that looks like equation (7) of Kingma 2014 Semi-supervised learning with deep… I’ll post when I have something

However, the learning is unstable and I sometimes have the following error:

I’m not surprised this happens when the dimension of the Categorical’s is large. Namely, the variance of the stochastic gradient updates increases with the dimension. As you note, one way to handle this (and which most VAE papers do) is to marginalize out the Categorical prior.

How come the output of a Dirichlet [0, 2) can be 2 ? is there some masked NaN ? Moreover, training loss does not stabilize. I’d like to add a nice notebook to the library on this when over.

If parameters in Categorical destabilize, the output can be strange:

from edward.models import Categorical
x = Categorical(probs=[0.0, 0.0]) # not technically valid
sess = ed.get_session()
sess.run(x)
## 2

As you note, one way to handle this (and which most VAE papers do) is to marginalize out the Categorical prior.

Yeah that is true but I am not sure of where to start modifying something of Edward. The result of the marginalization seems very simple (integrating the usual ELBO in fully observed model against the variational posterior of the categorical + the entropy term) but I have trouble seeing architecturally speaking where I should start playing.

I am working on pure TF now to have something working well. Tell me should you have a starting point.
Romain