Trying to implement Bayesian NN for regression


#1

I am trying to apply the Bayesian NN presented by Torsten Scholak at PyCon to some real world data I have, in order to familiarize myself with edward and tensorflow and I am getting very weird results.
The network fits the data well but only up to a certain point and then flatlines. I can’t figure out where in the code I should tweak it. Here is the code for the network

def neural_network_with_2_layers(x, W_0, W_1, b_0, b_1):
    h = tf.nn.tanh(tf.matmul(x, W_0) + b_0)
    h = tf.matmul(h, W_1) + b_1
    return tf.reshape(h, [-1])

dim = 10  # layer dimensions
W_0 = Normal(loc=tf.zeros([D, dim]),
             scale=tf.ones([D, dim]))
W_1 = Normal(loc=tf.zeros([dim, 1]),
             scale=tf.ones([dim, 1]))
b_0 = Normal(loc=tf.zeros(dim),
             scale=tf.ones(dim))
b_1 = Normal(loc=tf.zeros(1),
             scale=tf.ones(1))

x = tf.placeholder(tf.float32, [N, D])

#Reshaping
a = neural_network_with_2_layers(x,W_0,W_1,b_0,b_1)
b = tf.reshape(a,[len(X_train),1])
y = Normal(loc=b,scale=(tf.ones([N,1])*0.1))  # constant noise


 `#BACKWARD MODEL A`

q_W_0 = Normal(loc=tf.Variable(tf.random_normal([D, dim])),
               scale=tf.nn.softplus(tf.Variable(tf.random_normal([D, dim]))))
q_W_1 = Normal(loc=tf.Variable(tf.random_normal([dim, 1])),
               scale=tf.nn.softplus(tf.Variable(tf.random_normal([dim, 1]))))
q_b_0 = Normal(loc=tf.Variable(tf.random_normal([dim])),
               scale=tf.nn.softplus(tf.Variable(tf.random_normal([dim]))))
q_b_1 = Normal(loc=tf.Variable(tf.random_normal([1])),
               scale=tf.nn.softplus(tf.Variable(tf.random_normal([1]))))


inference = ed.KLqp(latent_vars={W_0: q_W_0, b_0: q_b_0,
                                 W_1: q_W_1, b_1: q_b_1},
                    data={x: X_train, y: Y_train})

inference.run(n_samples=50, n_iter=20000)

Here are the results
test

and the code to plot them

# CRITICISM A
plt.scatter(X_train, Y_train, s=20.0);  # blue
plt.scatter(X_test, Y_test, s=20.0,  # red
        color=sns.color_palette().as_hex()[2]);

xp = tf.placeholder(tf.float32, [1000, D])
[plt.plot(np.linspace(-1.0, 1.0, 1000),
      sess.run(neural_network_with_2_layers(xp,
                                            q_W_0, q_W_1,
                                            q_b_0, q_b_1),
               {xp: np.linspace(-1.0, 1.0, 1000)[:, np.newaxis]}),
      color='black', alpha=0.1)
 for _ in range(10)];

Cheers


#2

What size is the hidden layer? Also, to verify your code works, have you tried dropping the hidden layer in the neural_network code to see if it properly reduces to Bayesian linear regression?


#3

The size of the hidden layer is 10 and I just found out that my code indeed does not work even when I drop the hidden layer. Here is what it looked like

def neural_network_with_1_layer(x, W, b):
    h = tf.matmul(x,W) + b
    return tf.reshape(h, [-1])

W = Normal(loc=tf.zeros([1,1]),
             scale=tf.ones([1,1]))

b = Normal(loc=tf.zeros([1,1]),
             scale=tf.ones([1,1]))

x = tf.placeholder(tf.float32, [623, 1])


a = neural_network_with_1_layer(x,W,b)
c = tf.reshape(a,[len(X_train),1])
y = Normal(loc=c,scale=(tf.ones([1])*0.1))  # constant noise



q_W = Normal(loc=tf.Variable(tf.random_normal([1,1])),
               scale=tf.nn.softplus(tf.Variable(tf.random_normal([1,1]))))

q_b = Normal(loc=tf.Variable(tf.random_normal([1,1])),
               scale=tf.nn.softplus(tf.Variable(tf.random_normal([1,1]))))


inference = ed.KLqp(latent_vars={W: q_W, b: q_b},
                    data={x: X_train, y: Y_train})

inference.run(n_samples=10, n_iter=5000)


plt.scatter(X_train, Y_train, s=20.0,label="Training data");  # blue
plt.scatter(X_test, Y_test, s=20.0,label="Test data",  # red
            color=sns.color_palette().as_hex()[2]);

            

xp = tf.placeholder(tf.float32, [2000, 1])
[plt.plot(np.linspace(-.5, 1.0, 2000),
          sess.run(neural_network_with_2_layers(xp,q_W,q_b),
                   {xp: np.linspace(-.5, 1.0, 2000)[:, np.newaxis]}),
          color='black', alpha=0.1)
 for _ in range(10)];
plt.legend()

test2

And this is how I define my data in case it helps

X_test = data_4[::10]
X_test = X_test.reshape(len(X_test),1)
X_test = X_test.astype("float32")

X_train = data[::10]
X_train = X_train.astype("float32")
X_train = X_train.reshape(len(X_train),1)

Y_train = RUL_func(X_train)
Y_train = Y_train.astype("float32")
Y_train = Y_train.reshape(len(Y_train),1)

Y_test = RUL_func(X_test)
Y_test = Y_test.reshape(len(Y_test),1)
Y_test = Y_test.astype("float32")

Thanks for taking the time to help


#4

Update: The problem may be caused by the activation function. Changing it from tanh to relu gave me a linear regression and to relu6 a more non-linear (over)fit again with a cut-off point but at a more convenient, for the present data, value
This is the relu output
relu
And this is the relu6
relu6


#5

Is your data normalized as per your activation function, normalize 0,1 for Relu and (-1, 1) for tanh?
Initialize your weights with some random normal and minimize the weights by multiplying with some factor e.g. 0.01.


#6

That helped a lot. I already initialized the weights randomly but did not normalize the input. Here are 20 predictions using a tanh activation with (semi)normalized inputs (still, it does quite well)
tanh_norm
It seems to be better, with a lot less overfitting however now I am getting negative loss. Guess there’s some more tweaking to be done. Thanks


#7

@tristanmech your welcome :slight_smile: !