Dear all,

Recently I am doing some probabilistic deep learning in Edward, and have been confused by some implementations in Edward. If anyone has experience with it, thanks for your explanations!

I will keep my question as clear as possible. Basically, I want to build a latent variable model extracting latent structure from high-dimensional time series. I have modeled latent dynamics using LSTM and mapping function (from latent states to observed time series) using forward neural network. I am using variational inference KLqp in edward, and for example, ed.KLqp({z: qz}, data={x: x_data}) for inferring latent states.

Q1: I know in ed.KLqp, they also include variational EM process to estimate the parameter of forward neural network (am I right ?), and I want how they are going to estimate the parameters of LSTM parameters which modeling the latent dynamics? (Do I need to specify like ed.MAP() for optimizing those paramters?)

Q2: If I am not writing neural network myself, but using slim in tensorflow, and in Edward, they would apply gradient descent to estimate them?

Thanks for your response in advance!