I’m running HMC to learn a normal distribution, and conditioning on a categorical distribution. A very simplified example of my model is as follows:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import edward as ed
import numpy as np
import tensorflow as tf
from edward.models import Empirical, Normal, Categorical
from edward.inferences import HMC
mu = Normal(mu=0.0, sigma=1.0)
x = Normal(mu=tf.ones(50) * mu, sigma=1.0)
cat = Categorical(p=x)
observed = 3
qmu = Empirical(params=tf.Variable(tf.zeros([1000])))
inference = ed.HMC({mu: qmu},data={cat: observed})
inference.run()
sess = ed.get_session()
print(sess.run(qmu.params))

The acceptance rate is 0, and as a result, I end up with an empirical distribution of 0s for qmu, after inference. I assume this is a conceptual problem, not a coding one, but I’m not quite sure what I’m doing wrong.

As a follow up to this, I’m having trouble finding ways to do exact inference, on a distribution with discrete support in Edward. For example, suppose I have a joint distribution, over a Normal and a Categorical. I assume the Edwardian method is to compose two inference algorithims, e.g. HMC for the Normal, and something that supports discrete distributions for the Categorical. Sorry if this doesn’t make sense - happy to go into more detail, but not sure if this is the right place to sort out my conceptual confusions.

The model is not well-defined. You’re defining x as a Normal prior and then parameterizing cat's probabilities according to its sample. If you want to parameterize a Categorical with real values, try its logit parameterization. Swapping the line with

cat = Categorical(logits=x)

works. (Also, note you are defining a prior for x but not performing any inference on it. This is fine only in toy problems like this.)

suppose I have a joint distribution, over a Normal and a Categorical. I assume the Edwardian method is to compose two inference algorithims, e.g. HMC for the Normal, and something that supports discrete distributions for the Categorical.

It depends on what variables are latent and what variables are observed. In the provided example, the data is discrete (the categorical variable is observed). Thus you don’t need to do any “inference” over it; you only need to do inference over the unobserved variables (the normal).

Thanks - yeah, I see that the toy example I gave was ill-defined, sorry about that. As for the other question, I’m having a little trouble working out from the docs how to do exact inference for a discrete distribution. Say I have two latent variables: one is a Normal distribution, the other a Categorical. I want to use HMC for the former and something exact for the latter. My understanding is that I can just do two separate inferences, one for each, but if so, what class of inference should I use for the categorical?

Thanks, and good job with Edward - it’s really awesome!

if so, what class of inference should I use for the categorical?

There’s no universal solution: any inference that works over discrete variables is worth experimenting with (e.g., ideally Gibbs, or KLqp, or MetropolisHastings). You can refer to the reference (http://edwardlib.org/api/reference). You mention you want something “exact”. Can you elaborate?

What I was thinking of was something like webppl’s “enumerate” inference, which just gives a deterministic solution to the inference problem by enumerating through the support of the discrete distribution. I’m currently doing this “by hand” in Edward, but wanted to check if there was a more Edwardian way.

You can use the Mixture random variable, which integrates out the categorical probabilities. It’s also more efficient than naive approaches in that its log-density uses the logsumexp trick.

I suppose we could also have something like a ed.enumerate(x) function if helpful. It can return a Mixture random variable integrating over all the {categorical,bernoulli,multinomial,binomial} random variables that x depends on.

Ah, that makes sense, thanks! I guess “enumerate” might be useful for examples for people new to probabilistic programming, in order to show simple cases of inference, but this solution seems good.