Sorry to reawaken this old thread but I’ve been asking this question for a project I’ve been working on, too. In the case of variational inference in a linear model that consists strictly of independent normally distributed random variables, don’t all random variables converge in probability to their posterior means when sampling from y_post? In this case, shouldn’t it be correct to evaluate the model using ed.copy() to replace all random variables (including y) with their posterior means and then generate a single prediction? This would just be for computing a point estimate of the model’s error/likelihood. For other types of model criticism the full posterior predictive would be needed.

The implementation would look like this:

```
# From the supervised learning tutorial
from edward.models import Normal
X = tf.placeholder(tf.float32, [N, D])
w = Normal(loc=tf.zeros(D), scale=tf.ones(D))
b = Normal(loc=tf.zeros(1), scale=tf.ones(1))
y = Normal(loc=ed.dot(X, w) + b, scale=tf.ones(N))
qw = Normal(loc=tf.get_variable("qw/loc", [D]),
scale=tf.nn.softplus(tf.get_variable("qw/scale", [D])))
qb = Normal(loc=tf.get_variable("qb/loc", [1]),
scale=tf.nn.softplus(tf.get_variable("qb/scale", [1])))
inference = ed.KLqp({w: qw, b: qb}, data={X: X_train, y: y_train})
inference.run(n_samples=5, n_iter=250)
y_post = ed.copy(y, {w: qw, b: qb})
# Current proposal, used only for point estimation of error/likelihood
y_MAP = ed.copy(y.mean(), {w: qw.mean(), b: qb.mean()}, scope='MAP')
```