KLqp disturbed by irrelevant distribution?

MakGre · July 24, 2018, 2:22pm

Hi everybody,

I’m trying to understand the behavior of KLqp. I set up a minimal example illustrating the behavior, that confuses me. The setup is basically a trivial regression problem, in which input and output data differ by a constant (+normal distributed noise).
The only difficulty for the algorithm is to infer the scale of the noise as well. Here is the full code that does what I expect it to do:

import tensorflow as tf
import edward as ed
from edward.models import Normal, InverseGamma
import numpy as np

X = tf.placeholder(tf.float32, [None, 1])

sigma = InverseGamma(concentration=tf.ones((1,)), rate=tf.ones((1,)))
delta = Normal(loc=tf.ones(1,), scale=tf.ones(1,))

output = Normal(loc=X + delta, scale=sigma)
#output = Normal(loc=X + delta, scale=0.1)

# create variational distributions
q_concentration = tf.Variable(tf.zeros((1,)))
q_rate = tf.Variable(tf.zeros((1,)))
# exponentiate the variables to ensure positivity
q_sigma = InverseGamma(concentration=tf.exp(q_concentration),
                       rate=tf.exp(q_rate))

q_loc = tf.Variable(tf.zeros((1,)))
q_scale = tf.Variable(tf.zeros((1,)))
# exponentiate the scale to ensure positivity
q_delta = Normal(loc=q_loc, scale=tf.exp(q_scale))

# create training data
X_train = np.random.randn(1000, 1)
Y_train = X_train + 1.25 + np.random.randn(1000, 1)*0.1

# start inference
inference = ed.KLqp({sigma: q_sigma,
                     delta: q_delta},
                    data={X: X_train,
                          output: Y_train})
inference.run(logdir="log", n_iter=10000)

# print results
print("delta: mean", q_delta.mean().eval(), "std", np.sqrt(q_delta.variance().eval()))
print("sigma: mean", q_sigma.mean().eval(), "std", np.sqrt(q_sigma.variance().eval()))
print("sigma: concentration", q_sigma.concentration.eval(), "rate" , q_sigma.rate.eval())

The result is spot-on, indicated by the output

delta: mean [1.2460464] std [0.00939288]
sigma: mean [0.1000579] std [0.02421493]
sigma: concentration [19.074036] rate [1.8084501]

The algorithm recovers the constant shift and the scale of the noise rather well.

Now the only change is the following

output = Normal(loc=X + delta, scale=sigma)
#output = Normal(loc=X + delta, scale=0.1)

to

#output = Normal(loc=X + delta, scale=sigma)
output = Normal(loc=X + delta, scale=0.1)

So I set the scale of the noise to the true value. Supposedly, this should make the life of the algorithm easier. There is no interaction between the sigma and delta anymore and the posterior distribution of sigma just needs to recover the prior. But now I get the output

delta: mean [1.2473155] std [0.05659465]
sigma: mean [nan] std [nan]
sigma: concentration [1.] rate [1.]

So the standard deviation of the estimate of delta increased by a factor of 6. Shouldn’t it be lower or at least of roughly the same magnitude? In a more complicated model this effect completely ruins the estimates of my latent variables, so need to get to the bottom of this.

If I take sigma out of the inference by changing

# start inference
inference = ed.KLqp({sigma: q_sigma,
                     delta: q_delta},
                    data={X: X_train,
                          output: Y_train})

to

# start inference
inference = ed.KLqp(
                    {delta: q_delta},
                    data={X: X_train,
                          output: Y_train})

results get better again, but still slightly worse than originally

delta: mean [1.2501105] std [0.01766331]
sigma: mean [nan] std [nan]
sigma: concentration [1.] rate [1.]

I am using Edward version 1.3.5 and tensorflow version 1.7.0 in python 3.6.3 (WinPython disttribution)

Any help is appreciated. Thanks

Topic		Replies	Views
Rookie problem (KLqp gets obviously wrong result)	7	1785	October 17, 2017
KLqp ignores prior distribution?	3	1284	October 7, 2017
Problem with KLqp inference for Poisson distribution	3	1626	February 15, 2018
Biased variance estimates when using KLqp for Bayesian Linear Regression	7	1340	February 9, 2018
A toy normal model failed (klqp) and why?	2	1686	July 25, 2017

KLqp disturbed by irrelevant distribution?

Related topics