Understanding Edward KLqp algorithm

nadheesh · July 17, 2018, 3:07am

I’m trying to understand the underline implementation of Edward. Since I’m new to Bayesian Learning and Tensorflow, I find it is difficult to understand the logic by debugging the code. Moreover, I have gone through the paper “Deep Probabilistic Programming” by you.

Especially, I’m trying to understand the variational inference using Edward. However, still I can’t connect the the graphical representation of the probabilistic model with the auto encoding variational bayes and stochastic search (assuming that you use auto encoding VB for the KLqp inference)

Assume that we want to perform Bayesian Linear regression. Here we define the model as follows.

X = tf.placeholder(tf.float32, [None, d])

w = Normal(loc=tf.zeros(d), scale=tf.ones(d))
b = Normal(loc=tf.zeros(1), scale=tf.ones(1))
y = Normal(loc=ed.dot(X, w) + b, scale=1.0)

qw = Normal(loc=tf.get_variable("qw/loc", [d]),
            scale=tf.nn.softplus(tf.get_variable("qw/scale", [d])))

qb = Normal(loc=tf.get_variable("qb/loc", [1]),
            scale=tf.nn.softplus(tf.get_variable("qb/scale", [1])))

Now I don’t see why do we have to define prior and posterior separately as (w, qw) or (b, qb). I know that only qw and qb are considered trainable.

However, what is the significance of w and b during the training?

How do you map this representation to the algorithm presented in auto encoding variational bayes?

I appreciate if someone can provide me some help with understanding connection to Edward graphical representation from auto-encoding VB.

Thanks

deoxyribose · July 19, 2018, 3:06pm

A good review of variational inference is here: https://arxiv.org/pdf/1601.00670.pdf
There’s a great tutorial on how Edward does black-box variational inference here: Edward – KL(q||p) Minimization
That should clear things up. As for the algorithm in auto encoding variational bayes, that algorithm is specifically for VAEs, while KLqp is more general, using several ways of computing gradients and works for a large class of graphical models.

nadheesh · July 23, 2018, 3:31am

Thanks @deoxyribose.

Even though I have already read the pointed paper, I did not found the second source when I was searching to understand specific implementation of KLqp.

I will go through that the reference articles. Thanks a lot.

Topic		Replies	Views
Rookie problem (KLqp gets obviously wrong result)	7	1799	October 17, 2017
Variational inference with KLqp for LSTM example	0	891	February 25, 2018
KLqp underline implementation	2	966	June 26, 2018
Parameters not getting updated	4	943	May 11, 2017
Optimization of neural network parameters and inference of latent variable	0	880	January 7, 2018

Understanding Edward KLqp algorithm

Related topics