Any plans for NUTS?


#1

Hi guys,
Edward is an amazing idea!
Do you have any plans for adding NUTS (ni u-turn sampling) inference for Edward any time soon?
STAN has that and pyMC3 as well.
Best,
D.


#2

Thanks for asking. It’s not on my personal todo’s (https://github.com/blei-lab/edward/issues/464)—I think there’s alternative software that can handle this, and I like to prioritize features that users need and there is no alternative solution.

That said, contributions are welcome. NUTS would make big progress towards a more automated Edward.


#3

This might be a dumb question, but I am still getting used to programming in the tensorflow graph. I was working a little on my own in trying to implement the NUTS algo, and I have been doing this mostly by looking at the old matlab implementation, some of twiecki’s code in pymc3, and the paper itself. One thing that is a little difficult for me to implement is the recursive tree portion of the algo. It depends on checking multiple conditions and recursively building the tree. Is this easy to do in tensorflow? If anyone has done something like this before, are there any pointers? Sorry for the ignorance, but I didn’t think the implementation was that bad until I started dealing with the logic of how to build the tree within TF.

(If this is wrong place for development discussion, sorry. Didn’t want to just use the issues as a forum for advice.)


#4

It’s quite difficult to write in TensorFlow and will often have you staring at the API for the right functions. You have to convert the recursion into the form of a while loop and then invoke tf.while_loop.

One comparison that might be helpful are the following two examples ([1], [2]). They implement sampling from a Geometric distribution by drawing Bernoulli trials until one lands heads. One uses naive recursion, which fails in TensorFlow. The other uses a stochastic while loop, which works.

Another place to get started is to postpone work on the tree-building altogether. Instead, first work on adapting the step size via Nesterov’s dual averaging, and adaptation to find a good initial step size. These are much easier to make progress on.

(No worries; this is the right place for development discussion.)


#5

Thanks Dustin, that is great advice and I had started down the
tf.while_loop path so good to know that wasn’t entirely wrong way to go
about things and some concrete examples help.

I agree, I think the dual averaging task would be much easier to start on.


#6

Worked on forming the while loop for just finding the initial estimate for epsilon, and due to some of the weird errors I am getting, I am guessing I can only pass in tensors for the loop variables, and therefor things like leapfrog and log_joint that rely on dicts of RandomVariable: Tensor would have to be rewritten to a form that works on only tensors. Is that correct?

I don’t have a lot of experience with it yet, but control flow in tensorflow seems… not easy.


#7

That’s correct. You would have to convert the dicts to lists. This is what we do in other places of the codebase; for example, a key line in ed.HMC is the gradient of the log joint density with respect to its latent variables,

grad_log_joint = tf.gradients(log_joint(z_new), list(six.itervalues(z_new)))

This is faster than calling tf.gradients inside a loop over each value in z_new.

Yes, in general I don’t recommend control flow for the light-hearted. I toiled away at the stick breaking construction for the Dirichlet process for many days. :slight_smile:


#8

Hi !
I created a PR https://github.com/blei-lab/edward/pull/728 with the implementation of the Dual Averaging algorithm.


#9

Hi,

I was wondering if anyone has a sense of how much performance would be improved with NUTS in Edward vs Stan? I imagine the gradient calculations will be quicker but would the NUTS part of it slow things down enough to nullify that?

Thanks and all the best,
Martin


#10

@matt and statistics Googlers are currently looking to implement NUTS in TensorFlow (which may include or at least be compatible with Edward as a high-level or same-level interface).

I was wondering if anyone has a sense of how much performance would be improved with NUTS in Edward vs Stan?

You’ll see enormous speedups whenever you turn on specific features: lower precision than float64 training, XLA, GPUs/TPUs, multi-machine environments. And of course, a TF implementation of NUTS would be more programmable in the sense that you can more practically schedule various computation as you see fit.

As for a timeline for when NUTS will ever get to that point—I don’t know if anyone can answer that.