Trying to implement Bayesian NN for regression


I am trying to apply the Bayesian NN presented by Torsten Scholak at PyCon to some real world data I have, in order to familiarize myself with edward and tensorflow and I am getting very weird results.
The network fits the data well but only up to a certain point and then flatlines. I can’t figure out where in the code I should tweak it. Here is the code for the network

def neural_network_with_2_layers(x, W_0, W_1, b_0, b_1):
    h = tf.nn.tanh(tf.matmul(x, W_0) + b_0)
    h = tf.matmul(h, W_1) + b_1
    return tf.reshape(h, [-1])

dim = 10  # layer dimensions
W_0 = Normal(loc=tf.zeros([D, dim]),
             scale=tf.ones([D, dim]))
W_1 = Normal(loc=tf.zeros([dim, 1]),
             scale=tf.ones([dim, 1]))
b_0 = Normal(loc=tf.zeros(dim),
b_1 = Normal(loc=tf.zeros(1),

x = tf.placeholder(tf.float32, [N, D])

a = neural_network_with_2_layers(x,W_0,W_1,b_0,b_1)
b = tf.reshape(a,[len(X_train),1])
y = Normal(loc=b,scale=(tf.ones([N,1])*0.1))  # constant noise


q_W_0 = Normal(loc=tf.Variable(tf.random_normal([D, dim])),
               scale=tf.nn.softplus(tf.Variable(tf.random_normal([D, dim]))))
q_W_1 = Normal(loc=tf.Variable(tf.random_normal([dim, 1])),
               scale=tf.nn.softplus(tf.Variable(tf.random_normal([dim, 1]))))
q_b_0 = Normal(loc=tf.Variable(tf.random_normal([dim])),
q_b_1 = Normal(loc=tf.Variable(tf.random_normal([1])),

inference = ed.KLqp(latent_vars={W_0: q_W_0, b_0: q_b_0,
                                 W_1: q_W_1, b_1: q_b_1},
                    data={x: X_train, y: Y_train}), n_iter=20000)

Here are the results

and the code to plot them

plt.scatter(X_train, Y_train, s=20.0);  # blue
plt.scatter(X_test, Y_test, s=20.0,  # red

xp = tf.placeholder(tf.float32, [1000, D])
[plt.plot(np.linspace(-1.0, 1.0, 1000),,
                                            q_W_0, q_W_1,
                                            q_b_0, q_b_1),
               {xp: np.linspace(-1.0, 1.0, 1000)[:, np.newaxis]}),
      color='black', alpha=0.1)
 for _ in range(10)];



What size is the hidden layer? Also, to verify your code works, have you tried dropping the hidden layer in the neural_network code to see if it properly reduces to Bayesian linear regression?


The size of the hidden layer is 10 and I just found out that my code indeed does not work even when I drop the hidden layer. Here is what it looked like

def neural_network_with_1_layer(x, W, b):
    h = tf.matmul(x,W) + b
    return tf.reshape(h, [-1])

W = Normal(loc=tf.zeros([1,1]),

b = Normal(loc=tf.zeros([1,1]),

x = tf.placeholder(tf.float32, [623, 1])

a = neural_network_with_1_layer(x,W,b)
c = tf.reshape(a,[len(X_train),1])
y = Normal(loc=c,scale=(tf.ones([1])*0.1))  # constant noise

q_W = Normal(loc=tf.Variable(tf.random_normal([1,1])),

q_b = Normal(loc=tf.Variable(tf.random_normal([1,1])),

inference = ed.KLqp(latent_vars={W: q_W, b: q_b},
                    data={x: X_train, y: Y_train}), n_iter=5000)

plt.scatter(X_train, Y_train, s=20.0,label="Training data");  # blue
plt.scatter(X_test, Y_test, s=20.0,label="Test data",  # red


xp = tf.placeholder(tf.float32, [2000, 1])
[plt.plot(np.linspace(-.5, 1.0, 2000),
                   {xp: np.linspace(-.5, 1.0, 2000)[:, np.newaxis]}),
          color='black', alpha=0.1)
 for _ in range(10)];


And this is how I define my data in case it helps

X_test = data_4[::10]
X_test = X_test.reshape(len(X_test),1)
X_test = X_test.astype("float32")

X_train = data[::10]
X_train = X_train.astype("float32")
X_train = X_train.reshape(len(X_train),1)

Y_train = RUL_func(X_train)
Y_train = Y_train.astype("float32")
Y_train = Y_train.reshape(len(Y_train),1)

Y_test = RUL_func(X_test)
Y_test = Y_test.reshape(len(Y_test),1)
Y_test = Y_test.astype("float32")

Thanks for taking the time to help


Update: The problem may be caused by the activation function. Changing it from tanh to relu gave me a linear regression and to relu6 a more non-linear (over)fit again with a cut-off point but at a more convenient, for the present data, value
This is the relu output
And this is the relu6


Is your data normalized as per your activation function, normalize 0,1 for Relu and (-1, 1) for tanh?
Initialize your weights with some random normal and minimize the weights by multiplying with some factor e.g. 0.01.


That helped a lot. I already initialized the weights randomly but did not normalize the input. Here are 20 predictions using a tanh activation with (semi)normalized inputs (still, it does quite well)
It seems to be better, with a lot less overfitting however now I am getting negative loss. Guess there’s some more tweaking to be done. Thanks


@tristanmech your welcome :slight_smile: !