Gaussian Process Regression, sampling new data points from the predictive posterior


#1

I’m trying to implement a simple GP regressor in Edward to get the hang of things and understand the Edward API.
My issue is that I have no idea how to get predictions from the inferred posterior at points that have not been observed.

I understand that formulating a predictive posterior is just

post = ed.copy(y, {f: qf})

(which is amazing)

But I’m not too sure how to get predictions from this

Is doing something like

session.run(post.sample(), feed_dict={X: new_data})

a sensible approach?

Is there sample code for GP regressors?

Thanks


#2

Fetching post from a TensorFlow session represents one draw from the posterior predictive. Making predictions typically means taking the mean of the posterior predictive distribution. You can do this by writing

post_samples = []
for _ in range(100):
  post_samples.append(sess.run(post))

np.mean(post_samples)

#3

Is that different from taking the mean this way?

seas.run(post.sample((100,)), feed_dict={X: X_test}).mean()

Why not use the .mean() method of post directly?

sess.run(post.mean(), feed_dict={X: X_test})


#4

I guess Dustin did not tell you to use post.mean() because it requires that the mean is analytically tractable. Methods of RandomVariables do note estimate quantities through e.g. Monte Carlo by design [1].

While in the classical Gaussian likelihood case of GP regression the mean is analytically tractable, your example–strictly speaking–is not assuming that. E.g. all you say still holds a for Poisson likelihood. The general answer is hence to sample from the posterior, and compute the mean instead.

Note that alternatively, you could also define the MC sample completely in tensorflow and have a single sess.run(...) do the work. Sth along the lines of:

mc_mean = tf.reduce_mean(post.sample(100), 1)
sess.run(mc_mean, feed_dict={...})