just thought I'd start a thread for folks interested in using Edward to do variational inference/optimization for MDPs and RL/sequential decisions in general.
Here's roughly the setup I think is a good first attempt for most tasks
taken from Shakir Mohamed's NIPs 2016 talk, https://www.youtube.com/watch?v=AggqBRdz6CQ&feature=youtu.be&t=9m53s
I'm only working on very simple stuff as proof of principle. So for example I'm interested in getting bandits, or grid worlds like Frozen-Lake working, and then taking things from there.
Be great to here what other sequential decision tasks folks have managed to get working/are interested in applying variational inference to.