KLqp underline implementation

I’m new to Edward. I’m planing to use Edward for research purposes. I’m interested in knowing the concepts behind some of the underline implementation, especially about KLpq and KLqp. Can someone please answer the following questions.

  1. Does KLqp use stochastic variational inference ?
  2. What is the underline implementation of KLqp and KLpq ? Is it ADVI or Black-box variational inference?
  3. I found that KLqp support sub-sampling. If KLqp uses ADVI, what techniques can we used to extend it (compensate for dataset size - N) for streaming ML ?

I appreciate a lot if someone can answer these questions

Thanks!!

Hello,

I understand that it minimizes the KL divergence between q() and p(). If you can check the equation (3) at Ranganath el al. Black Box Variational Inference, that is the default gradients for updating the parameters of q().

Since in this approach, we’re using Monte Carlo estimates of the gradient, and the default gradient estimator Eq (3) has high variance,

They suggest using:

  1. If it’s possible to apply the reparametrization trick by Kingma and Welling 2013 VAE paper,
    1.1 if it’s analytically calculatable apply the analytic reparametrization trick
    1.2 else apply the reparametrization trick without analytic expression
  2. else apply Blackwellization

There are also some other methods which are commented out, I recommend you to read the papers published about Edward by Dustin Tran et al.

You can go to the KLqp part of the code and Ii you check this build_score_rb_loss_and_gradients functions each describe the paper and method being used.

You will the following if conditions:
if is_reparameterizable:
if is_analytic_kl:
return build_reparam_kl_loss_and_gradients(self, var_list)
else:
return build_reparam_loss_and_gradients(self, var_list)
else:
return build_score_rb_loss_and_gradients(self, var_list)

Hi,

I have read the papers. Black box VI has the problem of high variance because it does not use the gradient from the model output w.r.t. the model parameters. So in their ADVI paper, they point that out and improve it.

However, I checked the lastest edward documentation and it says klqp is implementing

It does this by variational EM, maximizing…

I don’t get what is Variational EM, I know EM and do you mean VEM = ADVI?