I’m trying to understand the underline implementation of Edward. Since I’m new to Bayesian Learning and Tensorflow, I find it is difficult to understand the logic by debugging the code. Moreover, I have gone through the paper “Deep Probabilistic Programming” by you.
Especially, I’m trying to understand the variational inference using Edward. However, still I can’t connect the the graphical representation of the probabilistic model with the auto encoding variational bayes and stochastic search (assuming that you use auto encoding VB for the KLqp inference)
Assume that we want to perform Bayesian Linear regression. Here we define the model as follows.
X = tf.placeholder(tf.float32, [None, d]) w = Normal(loc=tf.zeros(d), scale=tf.ones(d)) b = Normal(loc=tf.zeros(1), scale=tf.ones(1)) y = Normal(loc=ed.dot(X, w) + b, scale=1.0) qw = Normal(loc=tf.get_variable("qw/loc", [d]), scale=tf.nn.softplus(tf.get_variable("qw/scale", [d]))) qb = Normal(loc=tf.get_variable("qb/loc", ), scale=tf.nn.softplus(tf.get_variable("qb/scale", )))
Now I don’t see why do we have to define prior and posterior separately as (w, qw) or (b, qb). I know that only qw and qb are considered trainable.
However, what is the significance of w and b during the training?
How do you map this representation to the algorithm presented in auto encoding variational bayes?
I appreciate if someone can provide me some help with understanding connection to Edward graphical representation from auto-encoding VB.