I’m trying to run criticism of a simple bayesian linear regression model
When calling sess.run(Y_post.mean(), feed_dict=X_tst) the result varies each time this is run. I suspect this has something to do with the variational parameter distributions not being set to return their mean. Is this correct and if one wanted exactly the same predictive mean would you have to specify the variational distributions to be the mean values?
i.e. when calling
mean = sess.run(Y_post.mean(), feed_dict=X_test)
Should the definintion of Y_post be defined instead of
Y_post = ed.copy(Y, {W: qW, … } )
as
Y_post = ed.copy(Y, {W: qW.mean(), … } )
is this the correct way to do this? If not what is the recommended way to do this
Fetching y_post draws new parameters because any random variables it depends on in the computational graph are redrawn. The same happens if you try to fetch x in the program theta = Beta(1.0, 1.0); x = Bernoulli(probs=theta, sample_shape=50).
Consider what this means mathematically. The first line represents the posterior predictive,
To clarify, is there any difference in running Y_post, Y_post.mean() and multiple Y_post.sample() and then averaging their results; with Y_post.sample([num_samples]) being the most efficient to obtain an approximate posterior mean?
np.mean([sess.run(y_post) for _ in range(50)]) fetches a posterior sample; then likelihood sample; then repeats 50 times and takes the mean.
sess.run(y_post.mean()) fetches a posterior sample, then takes the likelihood’s mean given the single posterior sample.
sess.run(y_post.sample([num_samples])) fetches a posterior sample, then draws num_samples samples from the likelihood given the single posterior sample.
Sorry to reawaken this old thread but I’ve been asking this question for a project I’ve been working on, too. In the case of variational inference in a linear model that consists strictly of independent normally distributed random variables, don’t all random variables converge in probability to their posterior means when sampling from y_post? In this case, shouldn’t it be correct to evaluate the model using ed.copy() to replace all random variables (including y) with their posterior means and then generate a single prediction? This would just be for computing a point estimate of the model’s error/likelihood. For other types of model criticism the full posterior predictive would be needed.
The implementation would look like this:
# From the supervised learning tutorial
from edward.models import Normal
X = tf.placeholder(tf.float32, [N, D])
w = Normal(loc=tf.zeros(D), scale=tf.ones(D))
b = Normal(loc=tf.zeros(1), scale=tf.ones(1))
y = Normal(loc=ed.dot(X, w) + b, scale=tf.ones(N))
qw = Normal(loc=tf.get_variable("qw/loc", [D]),
scale=tf.nn.softplus(tf.get_variable("qw/scale", [D])))
qb = Normal(loc=tf.get_variable("qb/loc", [1]),
scale=tf.nn.softplus(tf.get_variable("qb/scale", [1])))
inference = ed.KLqp({w: qw, b: qb}, data={X: X_train, y: y_train})
inference.run(n_samples=5, n_iter=250)
y_post = ed.copy(y, {w: qw, b: qb})
# Current proposal, used only for point estimation of error/likelihood
y_MAP = ed.copy(y.mean(), {w: qw.mean(), b: qb.mean()}, scope='MAP')