I am just simply using KLqp, and my neural network model is very simple so it would be using the reparameteric gradient, which is the ADVI algorithm in the ADVI paper.
I have found that
- using Adam optimizer is really helpful especially in training neural networks
- using batch training is helpful
- number of samples doesn’t matter a lot
Please comment below if you find other things helpful!