Hi,
I have read the papers. Black box VI has the problem of high variance because it does not use the gradient from the model output w.r.t. the model parameters. So in their ADVI paper, they point that out and improve it.
However, I checked the lastest edward documentation and it says klqp is implementing
It does this by variational EM, maximizing…
I don’t get what is Variational EM, I know EM and do you mean VEM = ADVI?