Nan in summary histogram for: gradient

Hi Dustin! Many thanks! I had changes in the results.

Now I want understand why using a learning_rate = 1e-3 we obtain Loss: nan, but using learning_rate = 1e-2, we obtain Loss 18073.822. I’ll check the bibliography at Classes of Inference:

KLqp supports

1. score function gradients (Paisley et al., 2012)
2. reparameterization gradients (Kingma and Welling, 2014)

of the loss function.

learning_rate = 1e-3
optimizer = tf.train.AdamOptimizer(learning_rate)
inference.initialize(n_samples=5, n_iter=250, logdir='log2DMH', optimizer=optimizer)
inference.run()
1000/1000 [100%] ██████████████████████████████ Elapsed: 1s | Loss: nan    
learning_rate = 1e-2
optimizer = tf.train.AdamOptimizer(learning_rate)
inference.initialize(n_samples=5, n_iter=250, logdir='log2DMH', optimizer=optimizer)
inference.run()
1000/1000 [100%] ██████████████████████████████ Elapsed: 1s | Loss: 18073.822

---- EDIT ----
It was my fault! I forgot to do fine control of training procedure! Things are much better now.

learning_rate = 1e-3
optimizer = tf.train.AdamOptimizer(learning_rate)
inference.initialize(n_samples=30, n_iter=5000, logdir='log2DMH', optimizer=optimizer)
tf.global_variables_initializer().run()

for _ in range(inference.n_iter):
  info_dict = inference.update()
  inference.print_progress(info_dict)

inference.finalize()