Epistemic uncertainity using Edwards



I am learning how to use Edward following the tutorials which is an excellent knowledge repository. I am trying to build a simple two layer neural network following the tutorials. I am wondering if I could estimate the epistemic variance of the model as described in https://github.com/kyle-dorman/bayesian-neural-network-blogpost

Previously I tested the epistemic variance on the softmax model by Edward (https://www.alpha-i.co/blog/MNIST-for-ML-beginners-The-Bayesian-Way.html/) and the results showed that for MNIST images the variance was low and the predicted probabilities were higher and for non-MNIST images vice versa.
But for the 2 layer network, the results seem inconsistent to me. After training the model, I evaluate the model predictions using -
# Evaluation
Y_post = ed.copy(Y, {W_0: qW_0, b_0: qb_0,
W_1: qW_1, b_1: qb_1,
W_2: qW_2, b_2: qb_2,})
print('Test accuracy: ',ed.evaluate(‘categorical_accuracy’, data={X:X_test,Y_post:Y_test}))

which shows a 94% accuracy.

Next I obtain the posterior predictions using -
posterior = ed.copy(Y, dict_swap={W_0: qW_0.mean(), b_0: qb_0.mean(),
W_1: qW_1.mean(), b_1: qb_1.mean(),
W_2: qW_2.mean(), b_2: qb_2.mean(),})
Y_post1 = sess.run(posterior.sample(100), feed_dict={X: X_test, y_ph: Y_test})

The epistemic variance function is same as defined in the blog -
## Entropy = -sum over i {p(xi)*log(p(xi))}, i = 1,…,num. of data points
def predictive_entropy(prob):
return -np.sum(np.log(prob[prob != 0]) * prob[prob != 0])
#return -np.sum(np.log(prob) * prob)

The following code calculate the variance -
mean_prob_over_samples=np.mean(Y_post1, axis=0) ## prediction means
prediction_variances = np.apply_along_axis(predictive_entropy, axis=1, arr=mean_prob_over_samples)

Then I plot the variance vs. the prediction probabilities-
# one hot to dense format conversion
Y_test_onehot = np.argmax(mnist.test.labels,axis=1)
## get index of different test labels
idx_0 = np.where(Y_test_onehot == 0)[0]
idx_1 = np.where(Y_test_onehot == 1)[0]
idx_2 = np.where(Y_test_onehot == 2)[0]
idx_3 = np.where(Y_test_onehot == 3)[0]
idx_4 = np.where(Y_test_onehot == 4)[0]
idx_5 = np.where(Y_test_onehot == 5)[0]
idx_6 = np.where(Y_test_onehot == 6)[0]
idx_7 = np.where(Y_test_onehot == 7)[0]
idx_8 = np.where(Y_test_onehot == 8)[0]
idx_9 = np.where(Y_test_onehot == 9)[0]

#import pandas as pd
## plot histogram of variance of each class 
h=sns.jointplot(x=np.hstack([mean_prob_over_samples[idx_0, 0], mean_prob_over_samples[idx_1, 1],mean_prob_over_samples[idx_2, 2], mean_prob_over_samples[idx_3, 3], mean_prob_over_samples[idx_4, 4],mean_prob_over_samples[idx_5, 5],mean_prob_over_samples[idx_6, 6],
      mean_prob_over_samples[idx_7, 7],mean_prob_over_samples[idx_8, 8],mean_prob_over_samples[idx_9, 9]]),y=np.hstack([np.sqrt(prediction_variances[idx_0]), np.sqrt(prediction_variances[idx_1]), np.sqrt(prediction_variances[idx_2]), np.sqrt(prediction_variances[idx_3]), 
                 np.sqrt(prediction_variances[idx_4]) ,np.sqrt(prediction_variances[idx_5]), np.sqrt(prediction_variances[idx_6]),
      np.sqrt(prediction_variances[idx_7]), np.sqrt(prediction_variances[idx_8]), np.sqrt(prediction_variances[idx_9])]), kind='kde')
h.ax_joint.set_xlabel('Prediction probability of class')
h.ax_joint.set_ylabel('Standard deviation')
plt.figure(figsize=(8, 6))

The kde plot is blank. On changing to ‘scatter’, I see the points as below. I am not sure why the results look different. I thought the variance vs. probability should be similar to softmax model if the performance is similar.

Can someone point me to what I missed?
Thank you!