Renyi divergence variational inference

The code looks great. Some comments:

  • In klqp.py, we don’t place the build_loss_and_gradients function as a method inside KLqp because we use it across many KLqp algorithms. Since your function is only used in one class, it’s recommended you write it as a method (c.f., klpq.py).
  • What’s justification for a default alpha=0.2?
  • What does a ‘min’ backward pass correspond to? I’m not sure if I recall a VR-min; does it correspond to alpha → \infty? I haven’t done the math.
  • Is the logF = tf.reshape(logF, [inference.n_samples, 1]) reshape necessary? Seems like you could just do logF = tf.stack(logF)
  • Since you only clip on the LHS, you can change logF = tf.log(tf.clip_by_value(tf.reduce_mean(tf.exp(logF - logF_max), 0), 1e-9, np.inf)) to use tf.maximum(1e-9, *).

Would you be interested in submitting a PR? The algorithm would be a nice addition to Edward’s arsenal.

You’re correct. But it also covers trick 1: if the distributions have the property reparameterization_type == tf.contrib.distributions.FULLY_REPARAMETERIZED, then gradients with respect to distribution parameters backpropagate through the sampling. See also discussion in Gradient is incorrect for log pdf of Normal distribution · Issue #7236 · tensorflow/tensorflow · GitHub.