Problem with KLqp inference for Poisson distribution


#1

Hello!

I’m a newbie to Edward and Tensorflow so I decided to implement something very simple for a start. For my first exercise I am implementing bayesian inference for a Poisson distribution. This should be easy, right?

Unfortunately I ran into problems using KLqp inference for this problem. I can’t get it to converge to a correct posterior distribution no matter how I tweak the parameters (n_iter, n_samples).

# Generate sample data
N = 100
lamb_true = 1.0
data = np.random.poisson(lam=lamb_true, size=N).astype('float32')

# Setup generative model
alpha_0 = 1.0
beta_0 = 1.0
lamb = Gamma(concentration=tf.constant(alpha_0), rate=tf.constant(beta_0))
x = Poisson(lamb, sample_shape=N)

# Inference using KLqp: NOT WORKING CORRECTLY 
qlamb_alpha = tf.Variable(1.0) 
qlamb_beta = tf.Variable(1.0)
qlamb = Gamma(concentration=qlamb_alpha, rate=qlamb_beta)
inference = ed.KLqp({lamb: qlamb}, data={x: data})
inference.initialize(n_iter=20000, n_samples=5)
tf.global_variables_initializer().run()
for t in xrange(inference.n_iter):
    inference.update()

Full code with results is here https://github.com/stys/labs/blob/master/notebooks/edward-tutorials/poisson-distribution.ipynb

In attempt to figure out the problem I have implemented three other ways to estimate posterior distribution and all these methods give very consistent results, but KLqp inference is completely off.

First, the exact Gamma posterior is available for Gamma prior:

qlamb_exact = Gamma(concentration=tf.constant([alpha_0 + np.sum(data)]), rate=tf.constant(beta_0 + len(data)))

Second, I implemented Gibbs sampling with Gamma prior (using ed.Gibbs and the same generative model)

T = 500
x = Poisson(lamb, sample_shape=N)
qlamb2 = Empirical(tf.Variable(tf.zeros([T])))

inference2 = ed.Gibbs({lamb: qlamb2}, data={x: data})
inference2.initialize()
inference2.run()

Third is numerical maximization using analytic formula for ELBO using Gamma prior and Gamma posterior (see notebook for details). This one proves that minimizing KL-divergence is a valid method for this problem and the optimization problem is well posed.

def elbo(alpha, beta, alpha_0, beta_0, data):
   log_beta = np.log(beta)
   digamma_alpha = digamma(alpha)
   kl = (alpha - alpha_0) * digamma_alpha
   kl = kl - gammaln(alpha) + gammaln(alpha_0)
   kl = kl + alpha_0 * (log_beta - np.log(beta_0))
   kl = kl + alpha * (beta_0 - beta) / beta
   return - kl - len(data) * alpha / beta + (digamma_alpha - log_beta) * np.sum(data)

opt = minimize(
    fun=lambda x: -elbo(x[0], x[1], alpha_0, beta_0, data),
    method='l-bfgs-b',
    bounds=[(1e-4, None), (1.e-4, None)],
    x0=(1.0, 1.0)
)

Please see full code with results and plots in the notebook.
I would appreciate any help to figure out what am I doing wrong with KLqp inference! Thank you!


#2

Hi! I am having the exact same problem. Have you solved it? @dustin has said a lot of times in this forum that ed.KLqp doesn’t work well with Gamma and Dirichlet variational approximations (eg: Error in inference.run() for a mixture model), but I never found an explanation for this.


#3

You can recover the posterior quite well if you use a log-normal approximation instead of a Gamma approximation:

import numpy as np
import tensorflow as tf
import edward as ed
from edward.models import Poisson, Gamma, Empirical

ed.set_seed(42)

# Generate sample data
N = 100
lamb_true = 1.0
data = np.random.poisson(lam=lamb_true, size=N).astype('float32')

# Generative model
alpha_0 = 1.0
beta_0 = 1.0
lamb = Gamma(concentration=tf.constant(alpha_0), rate=tf.constant(beta_0))
x = Poisson(lamb, sample_shape=N)

# Inference using KLqp
q_ln_loc = tf.Variable(0.0)
q_ln_scale = tf.Variable(0.0)
ds = tf.contrib.distributions
q_ln = ed.models.TransformedDistribution(
  distribution=ds.Normal(loc=q_ln_loc, scale=tf.nn.softplus(q_ln_scale)),
  bijector=ds.bijectors.Exp(),
  name="LogNormalTransformedDistribution")

inference = ed.KLqp({lamb: q_ln}, data={x: data})
inference.run(n_iter=20000, n_samples=5)

sess = ed.get_session()
print("Posterior mean: {}".format(sess.run(tf.exp(q_ln_loc))))
print("Posterior variance: {}".format(sess.run(tf.exp(q_ln_scale))))

This should work in most applied settings, but doesn’t really answer the question of why the Gamma approximation does so poorly… which I’d love to know as well.


#4

Here’s a document about Gamma approximations in BBVI: ajbc.io/resources/bbvi_for_gammas.pdf