Combining Bernoulli variables

I am having trouble combining discrete random variables. All the examples in the documentation seem to consist mostly or entirely of continuous variables. The simplest model I could think of is where we have a latent variable of interest, x, and we have an observation which is equal to x with a probability that is high but not equal to 1. That is:

# Model
x = Bernoulli(0.5)  # Latent variable of interest
y = Bernoulli(0.9)  # A "noise"
z = tf.equal(x, y)  # Usually equals x, but not always.
z_data = True

# Inference
qx_p = tf.sigmoid(tf.Variable(tf.random_normal([])))
qx = Bernoulli(qx_p)
inference = ed.KLqp({x: qx}, data={z: z_data})
inference.run()
print("Posterior p(x=1|z)={}={}".format(qx_p.eval(), qx.mean().eval()))
# Prints:  Posterior p(x=1|z)=0.500000=0.622459
# Correct: Posterior p(x=1|z)=0.900000=0.900000

I also tried KLpq for inference but the loss never converges; an example result after 100,000 iterations is:

# KLpq:    Posterior p(x=1|z)=0.982831=0.727670

I tried Gibbs sampling but this gave me the error:

KeyError: "The name 'conjugate_log_joint/Bernoulli_9/_conjugate_log_prob:0' refers to a Tensor which does not exist. The operation, 'conjugate_log_joint/Bernoulli_9/_conjugate_log_prob', does not exist in the graph."

Finally, I tried this model, which is mathematically equivalent (except z now has range +1/-1 rather than 1/0), but with similar problems:

# Mathematically equivalent model
x = Bernoulli(0.5)
y = Bernoulli(0.9)
x_ = 2 * tf.cast(x, dtype=tf.float32) - 1
y_ = 2 * tf.cast(y, dtype=tf.float32) - 1
z = x_ * y_
z_data = 1

Does anyone have any thoughts? Am I just making a stupid mistake with the way I’m calling the API? Or am I trying to do something totally hopeless? I notice that Stan doesn’t even include support for discrete variables (it advises that you “integrate the discrete variables out” by hand!) - is that because this is never going to work?

As a last resort, I’m considering representing discrete variables using continuous variables instead e.g. with values >0 representing true. That seems a bit crazy to me though? Any thoughts or even examples of this being done before?

Hi John,

I’m learning Tensorflow and Edward and came across your problem. So as a learning exercise for me I tried:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
import numpy as np
import edward as ed
from edward.models import Bernoulli,Beta,Empirical,OneHotCategorical

# ed.set_seed(42)
#DATA
# x_data = np.array([0,1,0,0,0,0,0,0,0,1])
z_data = True

#MODEL
# p = Beta(1.0,1.0)
x = Bernoulli(0.55)
y = Bernoulli(0.95)
z = tf.equal(x,y)

#INFERENCE
qx_p = tf.sigmoid(tf.Variable(tf.random_normal([])))
qy_p = tf.sigmoid(tf.Variable(tf.random_normal([])))
qx = Bernoulli(qx_p)
qy = Bernoulli(qy_p)

# qp_a = tf.nn.softplus(tf.Variable(tf.random_normal([])))
# qp_b = tf.nn.softplus(tf.Variable(tf.random_normal([])))
# qp = Beta(qp_a,qp_b)

# qp = Empirical(params=tf.Variable(tf.zeros([1000]) + 0.5))
# proposal_p = Beta(3.0,9.0)

inference = ed.KLqp({x:qx, y:qy},data={z:z_data})
inference.run()
print("Posterior x = {}".format(qx_p.eval()))
print("Posterior y = {}".format(qy_p.eval()))

The result I get it is:

1000/1000 [100%] ██████████████████████████████ Elapsed: 1s | Loss: -0.000
Posterior x = 0.5499999523162842
Posterior y = 0.949999988079071

regards,
Simon

You’re defining a model whose likelihood (z) is not tractable. More specifically, z is a tf.Tensor and not a ed.RandomVariable: therefore z has no log_prob method that inference can rely on. Methods such as ed.KLqp and ed.Gibbs rely on tractable likelihoods—if it senses that data items passed in have the form tf.Tensor: tf.Tensor, then funny issues will arise. (It will not explicitly raise an error because tf.Tensor: tf.Tensor items have other use cases.)

You can either (a) rewrite your model with tractable likelihood; or (b) resort to likelihood-free methods such as ed.ImplicitKLqp (which I don’t recommend to non-researchers).

1 Like

@sabladmin1 Thanks Simon, I see you got the same problem as me: the inferred posteriors are just the same as the priors. In your case, the true posterior for both x and y is Bern(0.9587).

@dustin Thanks, that’s very helpful. I’ll think steer clear of ImplicitKLqp in that case :slight_smile: Based on what you said, I made an equivalent model where z is a ed.RandomVariable, but it didn’t really help:

# Model (and data)
x = Bernoulli(0.5)
z = Bernoulli(0.8 * tf.cast(x, tf.float32) + 0.1)
z_data = True
  • True posterior: p(x=1|z)=0.9
  • KLqp: qx_p.eval()=0.8042; qx.mean().eval()=0.6909
  • KLpq: e.g. qx_p.eval()=0.0303; qx.mean().eval()=0.5078 (does not converge)
  • Gibbs: Similar exception as before (“The name ‘conjugate_log_joint/Bernoulli_2_conjugate_log_prob:0’ refers to a Tensor which does not exist. …”)

Is this what you meant by “tractable likelihood”? I realise there are still intermediate tf.Tensor objects here, but that seems quite common in the examples. If that’s a problem I could try making my own Bernoulli ed.RandomVariable taking three parameters (input Bernoulli, p_false, p_true) but I don’t see how that would differ from the model above. Any thoughts?

1 Like