Prediction or Criticism of the model mean()

Hi all

I’m trying to run criticism of a simple bayesian linear regression model

When calling sess.run(Y_post.mean(), feed_dict=X_tst) the result varies each time this is run. I suspect this has something to do with the variational parameter distributions not being set to return their mean. Is this correct and if one wanted exactly the same predictive mean would you have to specify the variational distributions to be the mean values?

i.e. when calling

mean = sess.run(Y_post.mean(), feed_dict=X_test)

Should the definintion of Y_post be defined instead of

Y_post = ed.copy(Y, {W: qW, … } )

as

Y_post = ed.copy(Y, {W: qW.mean(), … } )

is this the correct way to do this? If not what is the recommended way to do this

Fetching y_post draws new parameters because any random variables it depends on in the computational graph are redrawn. The same happens if you try to fetch x in the program
theta = Beta(1.0, 1.0); x = Bernoulli(probs=theta, sample_shape=50).

Consider what this means mathematically. The first line represents the posterior predictive,

p(xnew | x) = \int p(xnew | theta) p(theta | x) d\theta

The second line represents the likelihood with parameters given by the posterior mean, p(xnew | theta = mean(p(theta | x))).

In general, to calculate something like the posterior predictive mean you should fetch y_post many times and average.

Thanks Dustin!

To clarify, is there any difference in running Y_post, Y_post.mean() and multiple Y_post.sample() and then averaging their results; with Y_post.sample([num_samples]) being the most efficient to obtain an approximate posterior mean?

Yes.

It’s worth working out what these mean:

  • np.mean([sess.run(y_post) for _ in range(50)]) fetches a posterior sample; then likelihood sample; then repeats 50 times and takes the mean.
  • sess.run(y_post.mean()) fetches a posterior sample, then takes the likelihood’s mean given the single posterior sample.
  • sess.run(y_post.sample([num_samples])) fetches a posterior sample, then draws num_samples samples from the likelihood given the single posterior sample.

Only the first method is correct.

cheers; this makes things clearer

Sorry to reawaken this old thread but I’ve been asking this question for a project I’ve been working on, too. In the case of variational inference in a linear model that consists strictly of independent normally distributed random variables, don’t all random variables converge in probability to their posterior means when sampling from y_post? In this case, shouldn’t it be correct to evaluate the model using ed.copy() to replace all random variables (including y) with their posterior means and then generate a single prediction? This would just be for computing a point estimate of the model’s error/likelihood. For other types of model criticism the full posterior predictive would be needed.

The implementation would look like this:

# From the supervised learning tutorial
from edward.models import Normal

X = tf.placeholder(tf.float32, [N, D])
w = Normal(loc=tf.zeros(D), scale=tf.ones(D))
b = Normal(loc=tf.zeros(1), scale=tf.ones(1))
y = Normal(loc=ed.dot(X, w) + b, scale=tf.ones(N))

qw = Normal(loc=tf.get_variable("qw/loc", [D]),
            scale=tf.nn.softplus(tf.get_variable("qw/scale", [D])))
qb = Normal(loc=tf.get_variable("qb/loc", [1]),
            scale=tf.nn.softplus(tf.get_variable("qb/scale", [1])))

inference = ed.KLqp({w: qw, b: qb}, data={X: X_train, y: y_train})
inference.run(n_samples=5, n_iter=250)
y_post = ed.copy(y, {w: qw, b: qb})

# Current proposal, used only for point estimation of error/likelihood
y_MAP = ed.copy(y.mean(), {w: qw.mean(), b: qb.mean()}, scope='MAP')