I am a newbie with Tensorflow. I used pymc3 for my project and heard Theano will stop developing, so I have to swing to Tensorflow and edward to implement Bayesian Deep learning.
When I read Weight Uncertainty in Neural Networks, I planed to implement the Bayesian by Backpropagation algorithm they proposed based on edward’s KLqp, and I was lucky to find there is a tutorial with Mxnet Bayes by Backprop from scratch (NN, classification). After studying the code, I have found that only the
loss function need to modify according to the paper and delete the scale option in the KLqp class.
However, in the paper the writer also proposed a mixture scale prior like a spike-and-slab. I implement this with
sigma_p1 = 0.75
sigma_p2 = 0.1
pi = 0.25
probs = [pi, 1.-pi]
n_hidden_1 = 400
cat_W_1 = cat_batch_shape(dim=[num_inputs, n_hidden_1, len(probs)])
W_1 = Mixture(cat=Categorical(probs=tf.convert_to_tensor(cat_W_1, dtype=tf.float32)),
scale=tf.constant(sigma_p2, shape=[num_inputs, n_hidden_1]))])
cat_batch_shape is a function to keep shape of cat property of Mixture same with components. But when I run the code with KLqp or BBB, the same nan errors show:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: gradient/qW_3/mu/0 [[Node: gradient/qW_3/mu/0 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](gradient/qW_3/mu/0/tag, gradients/AddN_11/_197)]] [[Node: norm_10/Squeeze/_194 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1702_norm_10/Squeeze", _device="/job:localhost/replica:0/task:0/device:GPU:0"](norm_10/Squeeze)]]
Dose the mixture scale priors produce these errors? Because I have reproduce the code according to MNIST FOR ML BEGINNERS: THE BAYESIAN WAY, both algorithm are successful.