Implementing the local reparameterization trick


#1

I’m trying to implement the local reparameterization trick (Kingma, Salimans, and Welling 2015) to fit regressions with the spike-and-slab prior (point-normal mixture) on regression coefficients. I previously did this in Theano using the reparameterization gradient and analytical KL.

My current attempt in Edward is https://github.com/aksarkar/nwas/blob/4d6a1332eb39ca2b5876e14912cbf8eae1b2ed3f/analysis/example.org

Is there a better way to do this in Edward?

I defined two new Edward random variables:

  • SpikeSlab, which supports analytical KL for the prior
  • GeneticValue (domain-specific jargon for x * theta), which supports sampling (using a tf.contrib.distributions.Normal instance) and dummy analytical KL (just returns 0)

Then, I call ed.ReparameterizationKLKLqp directly since I know it won’t blow up.

I can’t think of a way to do this that doesn’t expose the reparameterization in the model specification, but this solution doesn’t play nicely with ed.copy, so evaluating the model (e.g. computing coefficient of determination) requires pulling out the coefficients and computing things outside of Edward/Tensorflow.


#3

If you’d like to use the local reparameterization trick, “expos[ing] the reparameterization in the model specification” is the proper approach. Namely, define the model marginalizing out the weights and where the neurons are random. Inference should be over logodds, scale, and eta given y and x.

Alternatively, you can build a new inference algorithm to try to automate local reparameterizations. That said, I prefer the former approach because I personally view the technique more as a choice of model parameterization for efficient VI in the same way we might use non-centered parameterizations for efficient HMC.

this solution doesn’t play nicely with ed.copy, so evaluating the model (e.g. computing coefficient of determination) requires pulling out the coefficients and computing things outside of Edward/Tensorflow.

Given parameters for the marginal distribution on the neurons, you can calculate the parameters for the distribution on the weights—all in Edward/TensorFlow (there’s a 1-1 mapping as in, e.g., Eq 6 of their paper).