Rookie problem (KLqp gets obviously wrong result)

iris · September 28, 2017, 9:04pm

I just started looking at Edward. It looks very nice. To those of you who had a hand in creating it, thank you very much!

Unfortunately I am having a problem with making the inferencing work, even in a very simple case. I tried running the following code:

import tensorflow as tf
import edward as ed
from edward.models import Normal

print ed.__version__
print tf.__version__

# Generative model
A = Normal(0., 1., name='A')
B = Normal(A, 1., name='B')
C = Normal(A, 1., name='C')

# Variational model
mu = tf.Variable(0., name='mu')
sigma = tf.Variable(1., name='sigma')
qB = Normal(mu, sigma, name='qB')
with tf.Session() as sess:
    inference = ed.KLqp({B: qB}, {C: 100.})
    inference.run()
    print mu.eval(), sigma.eval()

The output is

1.3.3
1.3.0
1000/1000 [100%] ██████████████████████████████ Elapsed: 1s | Loss: 5024.725
0.0159369 1.0

Given that C is observed to be 100, the posterior of B should be centered at 50, isn’t it? Edward does not seem to get anywhere close to figuring this out. (I just posted this particular example but I also tried less extreme ones interactively and the inference result always seemed random.)

I am assuming that I am doing something basic wrong. Please advise. Thank you!

dustin · September 30, 2017, 6:19pm

You wrote a DAG A -> {B, C}, which implies B and C are conditionally independent given A. If you’d like to see how C’s observations affect the distribution of B, you need to simultaneously infer A and B.

iris · October 1, 2017, 11:33am

Thank you very much! I understand now what you meant by “Latent variables can be defined in the model without any posterior inference over them. They are implicitly marginalized out with a single sample” in your paper “Edward: A library for probabilistic modeling, inference, and criticism”

This leaves me with the problem of how to express my inference. I would like to get an approximation that captures the correlation between A and B. (I think the terminology is that I would like “full-rank” rather than “mean-field” variational inference.) The obvious way to do that would be to create a bivariate normal qAB (thus, with a total of five inference parameters: two for the means and three for the covariance matrix). But I don’t know how to express that in the dictionary syntax where the two keys A and B have to map to qAB jointly.

This is of course only a model problem for the actual problems I would like to solve with Edward. I can analytically solve for the posterior of A and B given C in this toy example. However, I foresee wanting to use Edward in a number of situations where I’d like to get correlation information about unknowns that are naturally expressed as different variables.

Thank you!

dustin · October 1, 2017, 3:01pm

For VI algorithms which require a parametric approximating family, you need to write the joint normal as factorized, either as q(A | B)q(B) or conversely. For inference algorithms such as MCMC, the drawn samples will always be correlated.

iris · October 2, 2017, 1:54pm

That makes sense, thank you! Now I think I understand it in theory.

In practice, there seems to be a problem still. For my original example, I implemented your original suggestion as follows:

import tensorflow as tf
import edward as ed
from edward.models import Normal

# Generative model
A = Normal(0., 1., name='A')
B = Normal(A, 1., name='B')
C = Normal(A, 1., name='C')

# Variational model
muA = tf.Variable(0., name='muA')
qA = Normal(muA, 1., name='qA')

muB = tf.Variable(0., name='muB')
qB = Normal(muB, 1., name='qB')

with tf.Session() as sess:
    inference = ed.KLqp({A: qA, B: qB}, {C: 4.})
    inference.run()
    print muA.eval(), muB.eval()

(To further clarify the example, I replaced all variances by 1, and reduced the C-observation from 100 to 4 to make sure that there are no problems caused by being in the extreme tail of the distribution.) Conditioning upon C = 4 I would expect the means of both A and B to be 2. The output I got (using edward 1.3.4 and tensorflow 1.3.0) was:

1000/1000 [100%] ██████████████████████████████ Elapsed: 2s | Loss: 13.069
1.9406 0.0122407

Thus it seems that the mean of A is correctly inferred to be more-or-less 2, but still it gives me 0 for B.

What’s going on here? How should I change this example to correctly infer the mean of B?

Thank you very much in advance!

gfeldman · October 8, 2017, 3:33am

Hi Iris,

I don’t believe you did anything wrong and that the issue is in the KLqp inference code. When both of the latent variables are Gaussian, the code tries to use the analytic form of the KL divergence, which does not work in this case. The ReparameterizationKLqp class skips this check and will work.

Try:

inference = ed.ReparameterizationKLqp({A: qA,B: qB}, {C: 4.})
inference.run()
print(muA.eval(), muB.eval())

The output I get:

1000/1000 [100%] ██████████████████████████████ Elapsed: 2s | Loss: 7.463
1.88677 2.21795

You can get more accurate results by asking for more samples to estimate the gradients:

inference = ed.ReparameterizationKLqp({A: qA,B: qB}, {C: 4.})
inference.initialize(n_samples=100)
inference.run()
print(muA.eval(), muB.eval())

which gives:

1000/1000 [100%] ██████████████████████████████ Elapsed: 18s | Loss: 5.434
2.0846 2.01117

Cheers,
Guy

dustin · October 8, 2017, 5:04pm

Good catch! Yes, it looks like the issue is in this line. We should be calculating KL’s in expectation over multiple samples to account for dependent q’s.

github.com

blei-lab/edward/blob/c2b61a282cfbf4d32248d6900274e445e99e98ac/edward/inferences/klqp.py#L701


    dict_swap[z] = qz_copy.value()


  for x in six.iterkeys(inference.data):
    if isinstance(x, RandomVariable):
      x_copy = copy(x, dict_swap, scope=scope)
      p_log_lik[s] += tf.reduce_sum(
          inference.scale.get(x, 1.0) * x_copy.log_prob(dict_swap[x]))


p_log_lik = tf.reduce_mean(p_log_lik)


kl_penalty = tf.reduce_sum([
    tf.reduce_sum(inference.kl_scaling.get(z, 1.0) * kl_divergence(qz, z))
    for z, qz in six.iteritems(inference.latent_vars)])


if inference.logging:
  tf.summary.scalar("loss/p_log_lik", p_log_lik,
                    collections=[inference._summary_key])
  tf.summary.scalar("loss/kl_penalty", kl_penalty,
                    collections=[inference._summary_key])


loss = -(p_log_lik - kl_penalty)

iris · October 17, 2017, 8:20pm

That does it. Thank you very much!

Topic		Replies	Views
A toy normal model failed (klqp) and why?	2	1674	July 25, 2017
Unable to inference the variance in a simple bayesian linear regression	0	898	January 7, 2018
Confused by error message from inference.run() for LDA with KLqp	5	2629	April 10, 2018
Understanding Edward KLqp algorithm	2	1502	July 23, 2018
Only inferencing partial latent variables	2	948	October 30, 2018

Rookie problem (KLqp gets obviously wrong result)

Related Topics