The code looks great. Some comments:
- In
klqp.py, we don’t place thebuild_loss_and_gradientsfunction as a method inside KLqp because we use it across many KLqp algorithms. Since your function is only used in one class, it’s recommended you write it as a method (c.f.,klpq.py). - What’s justification for a default
alpha=0.2? - What does a ‘min’ backward pass correspond to? I’m not sure if I recall a VR-min; does it correspond to alpha → \infty? I haven’t done the math.
- Is the
logF = tf.reshape(logF, [inference.n_samples, 1])reshape necessary? Seems like you could just dologF = tf.stack(logF) - Since you only clip on the LHS, you can change
logF = tf.log(tf.clip_by_value(tf.reduce_mean(tf.exp(logF - logF_max), 0), 1e-9, np.inf))to usetf.maximum(1e-9, *).
Would you be interested in submitting a PR? The algorithm would be a nice addition to Edward’s arsenal.
You’re correct. But it also covers trick 1: if the distributions have the property reparameterization_type == tf.contrib.distributions.FULLY_REPARAMETERIZED, then gradients with respect to distribution parameters backpropagate through the sampling. See also discussion in Gradient is incorrect for log pdf of Normal distribution · Issue #7236 · tensorflow/tensorflow · GitHub.