KLqp ignores prior distribution?


Hello All! I’m new to Edward (which is awesome, thanks!) and so apologies if this is a simple question.

I am trying to convert some existing models from MCMC (I have been using pymc) to variationaI Bayes, and I’m struggling to figure out the way that prior distributions are used in KLqp. A simple example is the inference of the input to a model from an observation of the output, shown below.

import edward as ed
import tensorflow as tf

sess = ed.get_session()

# a simple model function

def mfunc(x):
    return x**2.

# data to be fit

xd = 0.25
yd = mfunc(xd)

# input point to be inferred is known to be positive

X = ed.models.Uniform(low=0.,high=1.,name="X")
y = tf.identity(mfunc(X),name="y")

# the likelihood

yobs = ed.models.Normal( loc=y, scale=0.01, name="yobs" )

Clearly there are two solutions and my intention is that the prior on X removes the negative one. In the real application, the model function is a surrogate for a complex simulation which has been trained on a limited domain; I need to make sure that the model is not evaluated outside of that. The above works as expected with MetropolisHastings but with KLqp I get the negative answer;

# the variational model

xmuq = tf.Variable(0.5, name="xmu_post")
xscq = tf.nn.softplus(tf.Variable(1.,name="xsc_post_nt"), name="xsc_post")
xq = ed.models.Normal( loc=xmuq, scale=0.01*xscq, name="x_post" )

inference = ed.KLqp( {X:xq}, data={yobs: yd})
inference.run(n_samples=10, n_iter=20000,logdir='log')

print sess.run([xmuq,xscq])

20000/20000 [100%] ██████████████████████████████ Elapsed: 123s | Loss: inf  
[-0.24771848, 2.0027606]

I’m guessing that the loss function being inf reflects the fact that X is outside of it’s domain, so the question is; how do I restrict the KLqp inference to respect the prior I have defined?

Thanks, in advance,



Your prior is restricted to the positive domain, but your variational model is not, so it’s possible to get solutions outside of the prior (since KLqp samples from candidate posterior to calculate ELBO, and just uses prior for its likelihood, hence inf values, unlike MCMC which samples from prior).

So the solution is to restrict the variational model. You could do this by choosing a family of distributions constrained to [0, 1], like the Beta distribution, or specifying an unconstrained prior, then using ed.transform with the bijectors.Sigmoid() bijector from tf.contrib.distributions to transform the distribution to [0, 1]

Hope that helps!


Thanks! That makes sense. Am I right in thinking I can’t get a truly flat prior in either case?


Hey, do you mean an improper prior? Using a constrained variational model with a bounded uniform prior will properly use that prior to approximate the posterior. But if you want a flat prior on an unbounded interval, I’m not sure edward supports that.