Understanding Edward KLqp algorithm

I’m trying to understand the underline implementation of Edward. Since I’m new to Bayesian Learning and Tensorflow, I find it is difficult to understand the logic by debugging the code. Moreover, I have gone through the paper “Deep Probabilistic Programming” by you.

Especially, I’m trying to understand the variational inference using Edward. However, still I can’t connect the the graphical representation of the probabilistic model with the auto encoding variational bayes and stochastic search (assuming that you use auto encoding VB for the KLqp inference)

Assume that we want to perform Bayesian Linear regression. Here we define the model as follows.

X = tf.placeholder(tf.float32, [None, d])

w = Normal(loc=tf.zeros(d), scale=tf.ones(d))
b = Normal(loc=tf.zeros(1), scale=tf.ones(1))
y = Normal(loc=ed.dot(X, w) + b, scale=1.0)

qw = Normal(loc=tf.get_variable("qw/loc", [d]),
            scale=tf.nn.softplus(tf.get_variable("qw/scale", [d])))

qb = Normal(loc=tf.get_variable("qb/loc", [1]),
            scale=tf.nn.softplus(tf.get_variable("qb/scale", [1])))

Now I don’t see why do we have to define prior and posterior separately as (w, qw) or (b, qb). I know that only qw and qb are considered trainable.

However, what is the significance of w and b during the training?

How do you map this representation to the algorithm presented in auto encoding variational bayes?

I appreciate if someone can provide me some help with understanding connection to Edward graphical representation from auto-encoding VB.

Thanks

A good review of variational inference is here: https://arxiv.org/pdf/1601.00670.pdf
There’s a great tutorial on how Edward does black-box variational inference here: Edward – KL(q||p) Minimization
That should clear things up. As for the algorithm in auto encoding variational bayes, that algorithm is specifically for VAEs, while KLqp is more general, using several ways of computing gradients and works for a large class of graphical models.

1 Like

Thanks @deoxyribose.

Even though I have already read the pointed paper, I did not found the second source when I was searching to understand specific implementation of KLqp.

I will go through that the reference articles. Thanks a lot.