Unsuccesful Cholesky decomposition - Gaussian Process Regression

I’m trying to implement a Gaussian Process regressor using Edward

I’ve used the following code to build the model

X = tf.placeholder(tf.float32, [N, 1])
f = MultivariateNormalFullCovariance(
        loc=tf.zeros([N]),
        covariance_matrix=rbf(X)
    )
y = MultivariateNormalDiag(
        loc=f,
        scale_diag=tf.ones([N]) * 0.3)

and then the proposal distribution and inference as follows

qf = MultivariateNormalFullCovariance(loc=tf.Variable(tf.random_normal([N])),
            covariance_matrix=tf.Variable(tf.random_normal([N, N])))
inference = ed.KLqp({f: qf}, data={X: x_train, y: y_train})
inference.run(n_iter=5000)

I feel like I might have got something wrong. The inference throws an error saying that the Cholesky decomposition was unsuccessful.

Are there any examples that could help?

Your variational approximation has a N x N matrix of free parameters. It is not guaranteed to be positive-semi definite.

Have you looked at the GP classification tutorial (http://edwardlib.org/tutorials/supervised-classification)? Another helpful reference is examples/cox_process.py which adds an epsilon to the diagonal of the covariance matrix to improve stability.

The epsilon was really useful for the rbf, thanks!

My reasoning for the entire N x N matrix was to capture the covariances between the data in the GP, because I’m having issues making predictions.

I make predictions by sampling the predictive posterior and then averaging.

post = ed.copy(y, {f: qf})
samples = [sess.run(post, feed_dict={X: x_test}) for _ in range(1000)]

The issue is that this produces the same mean as the posterior (conditioned on x_train) and doesn’t reflect the x_test data. I thought having the covariances would help.

Is there something wrong with what I’m doing or is there a better approach?

Did you ever get your GP to extrapolate to test points that weren’t in the training set? And could you get the test set to be a different size than the training set? I think I can get comparable GP results as I get from sklearn GaussianProcessRegressor but, so far, I can’t get predictions for other points.

Any luck getting the GP to extrapolate to unobserved test points? I can’t seem to find how to do this.