Trying to implement Bayesian NN for regression

I am trying to apply the Bayesian NN presented by Torsten Scholak at PyCon to some real world data I have, in order to familiarize myself with edward and tensorflow and I am getting very weird results.
The network fits the data well but only up to a certain point and then flatlines. I can’t figure out where in the code I should tweak it. Here is the code for the network

def neural_network_with_2_layers(x, W_0, W_1, b_0, b_1):
    h = tf.nn.tanh(tf.matmul(x, W_0) + b_0)
    h = tf.matmul(h, W_1) + b_1
    return tf.reshape(h, [-1])

dim = 10  # layer dimensions
W_0 = Normal(loc=tf.zeros([D, dim]),
             scale=tf.ones([D, dim]))
W_1 = Normal(loc=tf.zeros([dim, 1]),
             scale=tf.ones([dim, 1]))
b_0 = Normal(loc=tf.zeros(dim),
             scale=tf.ones(dim))
b_1 = Normal(loc=tf.zeros(1),
             scale=tf.ones(1))

x = tf.placeholder(tf.float32, [N, D])

#Reshaping
a = neural_network_with_2_layers(x,W_0,W_1,b_0,b_1)
b = tf.reshape(a,[len(X_train),1])
y = Normal(loc=b,scale=(tf.ones([N,1])*0.1))  # constant noise


 `#BACKWARD MODEL A`

q_W_0 = Normal(loc=tf.Variable(tf.random_normal([D, dim])),
               scale=tf.nn.softplus(tf.Variable(tf.random_normal([D, dim]))))
q_W_1 = Normal(loc=tf.Variable(tf.random_normal([dim, 1])),
               scale=tf.nn.softplus(tf.Variable(tf.random_normal([dim, 1]))))
q_b_0 = Normal(loc=tf.Variable(tf.random_normal([dim])),
               scale=tf.nn.softplus(tf.Variable(tf.random_normal([dim]))))
q_b_1 = Normal(loc=tf.Variable(tf.random_normal([1])),
               scale=tf.nn.softplus(tf.Variable(tf.random_normal([1]))))


inference = ed.KLqp(latent_vars={W_0: q_W_0, b_0: q_b_0,
                                 W_1: q_W_1, b_1: q_b_1},
                    data={x: X_train, y: Y_train})

inference.run(n_samples=50, n_iter=20000)

Here are the results
test

and the code to plot them

# CRITICISM A
plt.scatter(X_train, Y_train, s=20.0);  # blue
plt.scatter(X_test, Y_test, s=20.0,  # red
        color=sns.color_palette().as_hex()[2]);

xp = tf.placeholder(tf.float32, [1000, D])
[plt.plot(np.linspace(-1.0, 1.0, 1000),
      sess.run(neural_network_with_2_layers(xp,
                                            q_W_0, q_W_1,
                                            q_b_0, q_b_1),
               {xp: np.linspace(-1.0, 1.0, 1000)[:, np.newaxis]}),
      color='black', alpha=0.1)
 for _ in range(10)];

Cheers

1 Like

What size is the hidden layer? Also, to verify your code works, have you tried dropping the hidden layer in the neural_network code to see if it properly reduces to Bayesian linear regression?

The size of the hidden layer is 10 and I just found out that my code indeed does not work even when I drop the hidden layer. Here is what it looked like

def neural_network_with_1_layer(x, W, b):
    h = tf.matmul(x,W) + b
    return tf.reshape(h, [-1])

W = Normal(loc=tf.zeros([1,1]),
             scale=tf.ones([1,1]))

b = Normal(loc=tf.zeros([1,1]),
             scale=tf.ones([1,1]))

x = tf.placeholder(tf.float32, [623, 1])


a = neural_network_with_1_layer(x,W,b)
c = tf.reshape(a,[len(X_train),1])
y = Normal(loc=c,scale=(tf.ones([1])*0.1))  # constant noise



q_W = Normal(loc=tf.Variable(tf.random_normal([1,1])),
               scale=tf.nn.softplus(tf.Variable(tf.random_normal([1,1]))))

q_b = Normal(loc=tf.Variable(tf.random_normal([1,1])),
               scale=tf.nn.softplus(tf.Variable(tf.random_normal([1,1]))))


inference = ed.KLqp(latent_vars={W: q_W, b: q_b},
                    data={x: X_train, y: Y_train})

inference.run(n_samples=10, n_iter=5000)


plt.scatter(X_train, Y_train, s=20.0,label="Training data");  # blue
plt.scatter(X_test, Y_test, s=20.0,label="Test data",  # red
            color=sns.color_palette().as_hex()[2]);

            

xp = tf.placeholder(tf.float32, [2000, 1])
[plt.plot(np.linspace(-.5, 1.0, 2000),
          sess.run(neural_network_with_2_layers(xp,q_W,q_b),
                   {xp: np.linspace(-.5, 1.0, 2000)[:, np.newaxis]}),
          color='black', alpha=0.1)
 for _ in range(10)];
plt.legend()

test2

And this is how I define my data in case it helps

X_test = data_4[::10]
X_test = X_test.reshape(len(X_test),1)
X_test = X_test.astype("float32")

X_train = data[::10]
X_train = X_train.astype("float32")
X_train = X_train.reshape(len(X_train),1)

Y_train = RUL_func(X_train)
Y_train = Y_train.astype("float32")
Y_train = Y_train.reshape(len(Y_train),1)

Y_test = RUL_func(X_test)
Y_test = Y_test.reshape(len(Y_test),1)
Y_test = Y_test.astype("float32")

Thanks for taking the time to help

1 Like

Update: The problem may be caused by the activation function. Changing it from tanh to relu gave me a linear regression and to relu6 a more non-linear (over)fit again with a cut-off point but at a more convenient, for the present data, value
This is the relu output
relu
And this is the relu6
relu6

Is your data normalized as per your activation function, normalize 0,1 for Relu and (-1, 1) for tanh?
Initialize your weights with some random normal and minimize the weights by multiplying with some factor e.g. 0.01.

2 Likes

That helped a lot. I already initialized the weights randomly but did not normalize the input. Here are 20 predictions using a tanh activation with (semi)normalized inputs (still, it does quite well)
tanh_norm
It seems to be better, with a lot less overfitting however now I am getting negative loss. Guess there’s some more tweaking to be done. Thanks

1 Like

@tristanmech your welcome :slight_smile: !