Loss are NaN when using KLqp or Bayesian by Backpropagation

I am a newbie with Tensorflow. I used pymc3 for my project and heard Theano will stop developing, so I have to swing to Tensorflow and edward to implement Bayesian Deep learning.
When I read Weight Uncertainty in Neural Networks, I planed to implement the Bayesian by Backpropagation algorithm they proposed based on edward’s KLqp, and I was lucky to find there is a tutorial with Mxnet Bayes by Backprop from scratch (NN, classification). After studying the code, I have found that only the loss function need to modify according to the paper and delete the scale option in the KLqp class.
However, in the paper the writer also proposed a mixture scale prior like a spike-and-slab. I implement this with Mixtrue

sigma_p1 = 0.75
sigma_p2 = 0.1
pi = 0.25
probs = [pi, 1.-pi]
def cat_batch_shape(dim=[], probs=probs):
cat = np.zeros(dim, dtype=np.float32)
if len(dim) == 3:
cat[:, :, 0:len(probs)] = probs
else:
cat[:, 0:len(probs)] = probs
return cat