I am a newbie with Tensorflow. I used pymc3 for my project and heard Theano will stop developing, so I have to swing to Tensorflow and edward to implement Bayesian Deep learning.

When I read Weight Uncertainty in Neural Networks, I planed to implement the Bayesian by Backpropagation algorithm they proposed based on edward’s KLqp, and I was lucky to find there is a tutorial with Mxnet Bayes by Backprop from scratch (NN, classification). After studying the code, I have found that only the `loss function`

need to modify according to the paper and delete the scale option in the KLqp class.

However, in the paper the writer also proposed a mixture scale prior like a spike-and-slab. I implement this with `Mixtrue`

sigma_p1 = 0.75

sigma_p2 = 0.1

pi = 0.25

probs = [pi, 1.-pi]

n_hidden_1 = 400

cat_W_1 = cat_batch_shape(dim=[num_inputs, n_hidden_1, len(probs)])

W_1 = Mixture(cat=Categorical(probs=tf.convert_to_tensor(cat_W_1, dtype=tf.float32)),

components=[Normal(loc=tf.zeros([num_inputs, n_hidden_1]),

scale=tf.constant(sigma_p1,shape=[num_inputs, n_hidden_1])),

Normal(loc=tf.zeros([num_inputs, n_hidden_1]),

scale=tf.constant(sigma_p2, shape=[num_inputs, n_hidden_1]))])

`cat_batch_shape`

is a function to keep shape of cat property of Mixture same with components. But when I run the code with KLqp or BBB, the same nan errors show:

`tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: gradient/qW_3/mu/0 [[Node: gradient/qW_3/mu/0 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](gradient/qW_3/mu/0/tag, gradients/AddN_11/_197)]] [[Node: norm_10/Squeeze/_194 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1702_norm_10/Squeeze", _device="/job:localhost/replica:0/task:0/device:GPU:0"](norm_10/Squeeze)]]`

Dose the mixture scale priors produce these errors? Because I have reproduce the code according to MNIST FOR ML BEGINNERS: THE BAYESIAN WAY, both algorithm are successful.