MDPs using Edward

AjayTalati · May 29, 2017, 5:36pm

Hi all,

just thought I’d start a thread for folks interested in using Edward to do variational inference/optimization for MDPs and RL/sequential decisions in general.

Here’s roughly the setup I think is a good first attempt for most tasks

taken from Shakir Mohamed’s NIPs 2016 talk, https://www.youtube.com/watch?v=AggqBRdz6CQ&feature=youtu.be&t=9m53s

I’m only working on very simple stuff as proof of principle. So for example I’m interested in getting bandits, or grid worlds like Frozen-Lake working, and then taking things from there.

Be great to here what other sequential decision tasks folks have managed to get working/are interested in applying variational inference to.

Cheers,

Aj

dustin · May 29, 2017, 6:03pm

I’m also a big fan of Bayesian policy search. Bayesian policy search is a simple method that’s easy to add onto current state-of-the-art policy gradient methods. And it’s easy to see where model-based RL/learning a dynamics model of the environment fits in.

AjayTalati · May 29, 2017, 6:31pm

Thanks a lot for the thumbs up!

Still think I need to read up a bit more before I’m confident I know what I’m doing. I’m guessing I’ve been a bit too ambitious with my first attempt - I’m trying to modify this repo

which has a fairly nice/clean implementation of A3C applied to bandits, and gridworlds - because they’re simple tasks they train train quite quickly - hours rather than days

The A3C-LSTM RL algorithm is a fairly friendly as it’s already got something similar to the last entropy term in the equation above, and also a stochastic policy.

AjayTalati · May 29, 2017, 7:39pm

One last thing about the above Meta-RL implementation is it seems to be conceptually very simply to the global optimization setup in this paper,

Learning to learn without gradient descent by gradient descent. (arXiv:1611.03824v4 [stat.ML] UPDATED) http://ift.tt/2g4zLK3

Which seems to be an improvement over previous Bayesian methods, (i.e. Sprearmint), for hyper-parameter tuning.

Topic		Replies	Views
Bayesian RNN in Edward	0	1432	March 13, 2019
Why Edward for Gaussian processes?	2	1454	February 25, 2018
Why is Edward, imho, the greatest idea since sliced bread, so poorly supported?	6	1859	February 22, 2018
Iterative estimators ("bayes filters") in Edward?	5	2343	April 30, 2017
Edward for Sequential Importance Resampling Particle Filter	7	2034	October 17, 2018

MDPs using Edward

Related Topics