Loss are NaN when using KLqp or Bayesian by Backpropagation

sejabs · March 31, 2018, 9:14am

I am a newbie with Tensorflow. I used pymc3 for my project and heard Theano will stop developing, so I have to swing to Tensorflow and edward to implement Bayesian Deep learning.
When I read Weight Uncertainty in Neural Networks, I planed to implement the Bayesian by Backpropagation algorithm they proposed based on edward’s KLqp, and I was lucky to find there is a tutorial with Mxnet Bayes by Backprop from scratch (NN, classification). After studying the code, I have found that only the loss function need to modify according to the paper and delete the scale option in the KLqp class.
However, in the paper the writer also proposed a mixture scale prior like a spike-and-slab. I implement this with Mixtrue

sigma_p1 = 0.75
sigma_p2 = 0.1
pi = 0.25
probs = [pi, 1.-pi]

n_hidden_1 = 400
cat_W_1 = cat_batch_shape(dim=[num_inputs, n_hidden_1, len(probs)])
W_1 = Mixture(cat=Categorical(probs=tf.convert_to_tensor(cat_W_1, dtype=tf.float32)),
components=[Normal(loc=tf.zeros([num_inputs, n_hidden_1]),
scale=tf.constant(sigma_p1,shape=[num_inputs, n_hidden_1])),
Normal(loc=tf.zeros([num_inputs, n_hidden_1]),
scale=tf.constant(sigma_p2, shape=[num_inputs, n_hidden_1]))])

cat_batch_shape is a function to keep shape of cat property of Mixture same with components. But when I run the code with KLqp or BBB, the same nan errors show:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: gradient/qW_3/mu/0 [[Node: gradient/qW_3/mu/0 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](gradient/qW_3/mu/0/tag, gradients/AddN_11/_197)]] [[Node: norm_10/Squeeze/_194 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1702_norm_10/Squeeze", _device="/job:localhost/replica:0/task:0/device:GPU:0"](norm_10/Squeeze)]]
Dose the mixture scale priors produce these errors? Because I have reproduce the code according to MNIST FOR ML BEGINNERS: THE BAYESIAN WAY, both algorithm are successful.

sejabs · April 2, 2018, 3:37am

I can answer myself ^-^!
First, I used tfdbg with filter has_inf_or_nan to debug the nan source. If the OS is Windows, there may be some problems.
Second, I found the problem was stem from learning rate. So I modified the VI and KLqp file to enable directly setting learning rate. With smaller learning rate 0…001, finally I got the comparable results with Bayes by Backprop from scratch.

YSanchezAraujo · May 5, 2018, 8:29pm

hi @sejabs, could you provide information for how you reduced the learning rate through the edward api?

Topic		Replies	Views
Loss are NaN when using KLqp or Bayesian by Backpropagation	0	539	March 31, 2018
NaNs in Tensor for model, decreasing learning rate doesn't help	0	632	May 5, 2018
Nan in summary histogram for: gradient	2	4439	June 15, 2017
Customising the MDN loss function	7	1436	August 15, 2017
Trying to implement Bayesian NN for regression	6	1317	October 15, 2017

Loss are NaN when using KLqp or Bayesian by Backpropagation

Related topics