Bayesian regression

VitSkvara · June 19, 2017, 7:31am

Hi,

I’m trying to implement a very simple Variational Bayes regression model, specifically the one from chapter 10.3 of Bishop’s Pattern Recognition and Machine Learning.

The code below that uses Edward runs succesfuly, but the fit is very poor. Using the method described in the book, the fit is practically the same as obtained using ordinary least squares (which is good for such a number of datapoints). Is there any particular reason for this? Should I build the model in a different way or use a different kind of inference?

import tensorflow as tf
import edward as ed
import numpy as np
import matplotlib.pyplot as plt

# DATA
def generate_data(N, sdy, theta):
    M = len(theta)
    w = theta.reshape(M)
    x = np.linspace(0, np.pi, N)
    X = np.concatenate((np.ones([N,1]), x.reshape([N,1]), np.sin(3*x).reshape([N,1])), axis = 1)
    Y = X.dot(w) + np.random.normal(0, sdy, size = (N))
    
    return X, Y, x


theta_true = np.array([0.1, -4, 8])
N = 50
M = len(theta_true)
sigma = 2
X_data, Y_data, x = generate_data(N, sigma, theta_true)

# MODEL
from edward.models import Normal
from edward.models import Gamma
from edward.models import MultivariateNormalDiag
sess = ed.get_session()

# priors
a0 = 1e-10
b0 = 1e-10

X = tf.placeholder(tf.float32, [N, M])
alpha = Gamma(concentration = tf.ones(1)*a0,  rate = tf.ones(1)*b0)
theta = Normal(loc = tf.zeros(M), scale = tf.ones(M)/alpha)
Y = Normal(loc = ed.dot(X,theta), scale = tf.ones(N)*sigma)

# INFERENCE
qAlpha = Gamma(concentration = tf.nn.softplus(tf.Variable(tf.ones(1))), 
               rate = tf.nn.softplus(tf.Variable(tf.ones(1))))
qTheta = MultivariateNormalDiag(tf.Variable(tf.random_normal([M])), 
	tf.nn.softplus(tf.Variable(tf.ones(M))))

inference = ed.KLqp({theta: qTheta, alpha: qAlpha}, data={X: X_data, Y: Y_data})
inference.run(n_iter = 500)

# EVALUATION
print("True theta: ")
print(theta_true)
print("Theta estimate: ")
print(qTheta.mean().eval())

plt.figure()
plt.scatter(x, Y_data)
thetas = qTheta.sample(10).eval()
for th in thetas:
    plt.plot(x, X_data.dot(th.reshape([M,1])))
plt.show()

dustin · June 19, 2017, 8:19am

Did you check that the algorithm converged? My guess is it hasn’t. One useful tool is to use TensorBoard to track the loss function (see the tutorial on the website).

VitSkvara · June 19, 2017, 1:03pm

Indeed, the algorithm did not converge. But even setting the iteration count to 5k or 50k doe not help as the loss oscillates around some value. Is there maybe something else I am doing wrong or is this problem not suitable for solving with edward (which is otherwise great!) ?

dustin · June 19, 2017, 6:17pm

ed.KLqp has noisy gradients if you use a Gamma distribution—it relies on score function gradients, which tend to be unstable for Gamma’s. You can try increasing n_samples in inference.initialize(); this will help reduce variance. (See the KLqp tutorial for the mathematical details.) Alternatively, you can tweak the model to use a LogNormal prior instead of a Gamma or just use another algorithm.

VitSkvara · June 26, 2017, 12:21pm

Thanks for the answer. Using more samples was somewhat helpful, as well as using the lognormal for alpha. Best results were obtained using ed.KLpq.

Topic		Replies	Views
Poisson-gamma regression	6	1961	March 19, 2017
Any tutorials on MCMC?	3	2472	June 29, 2018
Unable to inference the variance in a simple bayesian linear regression	0	872	January 7, 2018
Modify Bayesian Linear Model to use Gibbs, but got NotImplementedError	0	777	August 3, 2018
Inference from within the model with prediction result	1	1191	July 31, 2017

Bayesian regression

Related Topics