I’m trying to use Edward’s Mixture Model on top of a CNN. I observed that Edward duplicates the whole graph internally (random_variables.py/copy). This isn’t such a big deal when the graph is small, but in my case it unnecessarily duplicates the whole CNN which limits the size of the network that I can train. Is there any way to tell Edward to not duplicate the operations based on either their name or scope?
Copying is necessary because for certain algorithms, we need to change the connectivity of nodes in the model. For example, the model is written connecting the prior to the likelihood, but the
ed.KLqp algorithm requires calculating the likelihood given samples from the approximate posterior. This requires copying the likelihood nodes and swapping the prior dependence with the approximate posterior.
I’ve been dealing with this in my own large-scale experiments by hacking a new inference algorithm that doesn’t use
ed.copy, and which assumes the model is already written with whatever connectivities are needed. This is easy for example with
ed.MAP, if we assume there are no latent variables and we only want to optimize
tf.Variables written in the model.
There’s more work to be done on intelligently avoiding copies in special cases. Contributions are welcome.
At a higher level, I’ve been thinking about a dynamic version of Edward that does lazy evaluation and avoids any graph-building until inference altogether; this will make graph building significantly faster. If this pans out, it will not be in Edward for at least several months to come.
Thanks for the detailed reply, Dustin!
An alternative to copying would be to inject a tf.cond() which could decide to use either prior or approximate posterior depending upon what we would like the graph to do. I’m new to Edward so not sure about the details and whether it’ll work out or not. If it sounds reasonable, I can try to test this out on a simple case.
Also, I’m using
ed.MAP, so your hack would be a great help. Is there a way I can get hold of the new inference algorithm?
An alternative to copying would be to inject a tf.cond() which could decide to use either prior or approximate posterior depending upon what we would like the graph to do.
Unfortunately this doesn’t work because we’d like the user to be able to write a model independent of how inference might be performed. This means the model is given to us with the prior already connected.
Also, I’m using ed.MAP, so your hack would be a great help. Is there a way I can get hold of the new inference algorithm?
Sure: 1. Copy and paste the
MAP class’ source code inside
map.py. 2. Remove the two
copy lines, e.g., replace
z_copy = copy(z, dict_swap, scope=scope) with
z_copy = z. 3. profit.
Thanks!! That helps a lot!
Motivated by this discussion, I went ahead and added this special case to MAP: https://github.com/blei-lab/edward/pull/705.