I am having trouble combining discrete random variables. All the examples in the documentation seem to consist mostly or entirely of continuous variables. The simplest model I could think of is where we have a latent variable of interest, x, and we have an observation which is equal to x with a probability that is high but not equal to 1. That is:
# Model
x = Bernoulli(0.5) # Latent variable of interest
y = Bernoulli(0.9) # A "noise"
z = tf.equal(x, y) # Usually equals x, but not always.
z_data = True
# Inference
qx_p = tf.sigmoid(tf.Variable(tf.random_normal([])))
qx = Bernoulli(qx_p)
inference = ed.KLqp({x: qx}, data={z: z_data})
inference.run()
print("Posterior p(x=1|z)={}={}".format(qx_p.eval(), qx.mean().eval()))
# Prints: Posterior p(x=1|z)=0.500000=0.622459
# Correct: Posterior p(x=1|z)=0.900000=0.900000
I also tried KLpq for inference but the loss never converges; an example result after 100,000 iterations is:
# KLpq: Posterior p(x=1|z)=0.982831=0.727670
I tried Gibbs sampling but this gave me the error:
KeyError: "The name 'conjugate_log_joint/Bernoulli_9/_conjugate_log_prob:0' refers to a Tensor which does not exist. The operation, 'conjugate_log_joint/Bernoulli_9/_conjugate_log_prob', does not exist in the graph."
Finally, I tried this model, which is mathematically equivalent (except z now has range +1/-1 rather than 1/0), but with similar problems:
# Mathematically equivalent model
x = Bernoulli(0.5)
y = Bernoulli(0.9)
x_ = 2 * tf.cast(x, dtype=tf.float32) - 1
y_ = 2 * tf.cast(y, dtype=tf.float32) - 1
z = x_ * y_
z_data = 1
Does anyone have any thoughts? Am I just making a stupid mistake with the way I’m calling the API? Or am I trying to do something totally hopeless? I notice that Stan doesn’t even include support for discrete variables (it advises that you “integrate the discrete variables out” by hand!) - is that because this is never going to work?
As a last resort, I’m considering representing discrete variables using continuous variables instead e.g. with values >0 representing true. That seems a bit crazy to me though? Any thoughts or even examples of this being done before?