Bayesian regression

Hi,

I’m trying to implement a very simple Variational Bayes regression model, specifically the one from chapter 10.3 of Bishop’s Pattern Recognition and Machine Learning.

The code below that uses Edward runs succesfuly, but the fit is very poor. Using the method described in the book, the fit is practically the same as obtained using ordinary least squares (which is good for such a number of datapoints). Is there any particular reason for this? Should I build the model in a different way or use a different kind of inference?

import tensorflow as tf
import edward as ed
import numpy as np
import matplotlib.pyplot as plt

# DATA
def generate_data(N, sdy, theta):
    M = len(theta)
    w = theta.reshape(M)
    x = np.linspace(0, np.pi, N)
    X = np.concatenate((np.ones([N,1]), x.reshape([N,1]), np.sin(3*x).reshape([N,1])), axis = 1)
    Y = X.dot(w) + np.random.normal(0, sdy, size = (N))
    
    return X, Y, x


theta_true = np.array([0.1, -4, 8])
N = 50
M = len(theta_true)
sigma = 2
X_data, Y_data, x = generate_data(N, sigma, theta_true)

# MODEL
from edward.models import Normal
from edward.models import Gamma
from edward.models import MultivariateNormalDiag
sess = ed.get_session()

# priors
a0 = 1e-10
b0 = 1e-10

X = tf.placeholder(tf.float32, [N, M])
alpha = Gamma(concentration = tf.ones(1)*a0,  rate = tf.ones(1)*b0)
theta = Normal(loc = tf.zeros(M), scale = tf.ones(M)/alpha)
Y = Normal(loc = ed.dot(X,theta), scale = tf.ones(N)*sigma)

# INFERENCE
qAlpha = Gamma(concentration = tf.nn.softplus(tf.Variable(tf.ones(1))), 
               rate = tf.nn.softplus(tf.Variable(tf.ones(1))))
qTheta = MultivariateNormalDiag(tf.Variable(tf.random_normal([M])), 
	tf.nn.softplus(tf.Variable(tf.ones(M))))

inference = ed.KLqp({theta: qTheta, alpha: qAlpha}, data={X: X_data, Y: Y_data})
inference.run(n_iter = 500)

# EVALUATION
print("True theta: ")
print(theta_true)
print("Theta estimate: ")
print(qTheta.mean().eval())

plt.figure()
plt.scatter(x, Y_data)
thetas = qTheta.sample(10).eval()
for th in thetas:
    plt.plot(x, X_data.dot(th.reshape([M,1])))
plt.show()

Did you check that the algorithm converged? My guess is it hasn’t. One useful tool is to use TensorBoard to track the loss function (see the tutorial on the website).

Indeed, the algorithm did not converge. But even setting the iteration count to 5k or 50k doe not help as the loss oscillates around some value. Is there maybe something else I am doing wrong or is this problem not suitable for solving with edward (which is otherwise great!) ?

ed.KLqp has noisy gradients if you use a Gamma distribution—it relies on score function gradients, which tend to be unstable for Gamma’s. You can try increasing n_samples in inference.initialize(); this will help reduce variance. (See the KLqp tutorial for the mathematical details.) Alternatively, you can tweak the model to use a LogNormal prior instead of a Gamma or just use another algorithm.

Thanks for the answer. Using more samples was somewhat helpful, as well as using the lognormal for alpha. Best results were obtained using ed.KLpq.