Any plans for NUTS?

korkinof · March 12, 2017, 8:15pm

Hi guys,
Edward is an amazing idea!
Do you have any plans for adding NUTS (ni u-turn sampling) inference for Edward any time soon?
STAN has that and pyMC3 as well.
Best,
D.

dustin · March 12, 2017, 11:46pm

Thanks for asking. It’s not on my personal todo’s (https://github.com/blei-lab/edward/issues/464)—I think there’s alternative software that can handle this, and I like to prioritize features that users need and there is no alternative solution.

That said, contributions are welcome. NUTS would make big progress towards a more automated Edward.

lbollar · March 21, 2017, 9:31pm

This might be a dumb question, but I am still getting used to programming in the tensorflow graph. I was working a little on my own in trying to implement the NUTS algo, and I have been doing this mostly by looking at the old matlab implementation, some of twiecki’s code in pymc3, and the paper itself. One thing that is a little difficult for me to implement is the recursive tree portion of the algo. It depends on checking multiple conditions and recursively building the tree. Is this easy to do in tensorflow? If anyone has done something like this before, are there any pointers? Sorry for the ignorance, but I didn’t think the implementation was that bad until I started dealing with the logic of how to build the tree within TF.

(If this is wrong place for development discussion, sorry. Didn’t want to just use the issues as a forum for advice.)

dustin · March 21, 2017, 10:45pm

It’s quite difficult to write in TensorFlow and will often have you staring at the API for the right functions. You have to convert the recursion into the form of a while loop and then invoke tf.while_loop.

One comparison that might be helpful are the following two examples ([1], [2]). They implement sampling from a Geometric distribution by drawing Bernoulli trials until one lands heads. One uses naive recursion, which fails in TensorFlow. The other uses a stochastic while loop, which works.

Another place to get started is to postpone work on the tree-building altogether. Instead, first work on adapting the step size via Nesterov’s dual averaging, and adaptation to find a good initial step size. These are much easier to make progress on.

(No worries; this is the right place for development discussion.)

lbollar · March 21, 2017, 11:04pm

Thanks Dustin, that is great advice and I had started down the
tf.while_loop path so good to know that wasn’t entirely wrong way to go
about things and some concrete examples help.

I agree, I think the dual averaging task would be much easier to start on.

lbollar · March 27, 2017, 3:51pm

Worked on forming the while loop for just finding the initial estimate for epsilon, and due to some of the weird errors I am getting, I am guessing I can only pass in tensors for the loop variables, and therefor things like leapfrog and log_joint that rely on dicts of RandomVariable: Tensor would have to be rewritten to a form that works on only tensors. Is that correct?

I don’t have a lot of experience with it yet, but control flow in tensorflow seems… not easy.

dustin · March 27, 2017, 7:38pm

That’s correct. You would have to convert the dicts to lists. This is what we do in other places of the codebase; for example, a key line in ed.HMC is the gradient of the log joint density with respect to its latent variables,

grad_log_joint = tf.gradients(log_joint(z_new), list(six.itervalues(z_new)))

This is faster than calling tf.gradients inside a loop over each value in z_new.

Yes, in general I don’t recommend control flow for the light-hearted. I toiled away at the stick breaking construction for the Dirichlet process for many days.

emilemathieu · August 4, 2017, 7:03pm

Hi !
I created a PR https://github.com/blei-lab/edward/pull/728 with the implementation of the Dual Averaging algorithm.

martiningram · October 7, 2017, 1:35am

Hi,

I was wondering if anyone has a sense of how much performance would be improved with NUTS in Edward vs Stan? I imagine the gradient calculations will be quicker but would the NUTS part of it slow things down enough to nullify that?

Thanks and all the best,
Martin

dustin · October 7, 2017, 4:26am

@matt and statistics Googlers are currently looking to implement NUTS in TensorFlow (which may include or at least be compatible with Edward as a high-level or same-level interface).

I was wondering if anyone has a sense of how much performance would be improved with NUTS in Edward vs Stan?

You’ll see enormous speedups whenever you turn on specific features: lower precision than float64 training, XLA, GPUs/TPUs, multi-machine environments. And of course, a TF implementation of NUTS would be more programmable in the sense that you can more practically schedule various computation as you see fit.

As for a timeline for when NUTS will ever get to that point—I don’t know if anyone can answer that.

Topic		Replies	Views
Edward = Probabilistic Programming Language on Tensor Flow?	0	840	September 2, 2018
TODO List of inference algorithms to be implemented?	1	943	March 31, 2017
Edward is officially moving into TensorFlow	7	24649	October 30, 2017
Will it support Stochastic Process in future?	5	1749	December 12, 2017
HMM Implementation with Marginalized Latent States	3	2187	August 25, 2018

Any plans for NUTS?

Related topics