Limit the operations that are duplicated in random_variables.py/copy

mkabra · June 30, 2017, 12:29pm

Hi,

I’m trying to use Edward’s Mixture Model on top of a CNN. I observed that Edward duplicates the whole graph internally (random_variables.py/copy). This isn’t such a big deal when the graph is small, but in my case it unnecessarily duplicates the whole CNN which limits the size of the network that I can train. Is there any way to tell Edward to not duplicate the operations based on either their name or scope?

Thanks,
Mayank

dustin · July 1, 2017, 11:05pm

Copying is necessary because for certain algorithms, we need to change the connectivity of nodes in the model. For example, the model is written connecting the prior to the likelihood, but the ed.KLqp algorithm requires calculating the likelihood given samples from the approximate posterior. This requires copying the likelihood nodes and swapping the prior dependence with the approximate posterior.

I’ve been dealing with this in my own large-scale experiments by hacking a new inference algorithm that doesn’t use ed.copy, and which assumes the model is already written with whatever connectivities are needed. This is easy for example with ed.MAP, if we assume there are no latent variables and we only want to optimize tf.Variables written in the model.

There’s more work to be done on intelligently avoiding copies in special cases. Contributions are welcome.

At a higher level, I’ve been thinking about a dynamic version of Edward that does lazy evaluation and avoids any graph-building until inference altogether; this will make graph building significantly faster. If this pans out, it will not be in Edward for at least several months to come.

mkabra · July 2, 2017, 12:06pm

Thanks for the detailed reply, Dustin!

An alternative to copying would be to inject a tf.cond() which could decide to use either prior or approximate posterior depending upon what we would like the graph to do. I’m new to Edward so not sure about the details and whether it’ll work out or not. If it sounds reasonable, I can try to test this out on a simple case.

Also, I’m using ed.MAP, so your hack would be a great help. Is there a way I can get hold of the new inference algorithm?

dustin · July 2, 2017, 9:32pm

An alternative to copying would be to inject a tf.cond() which could decide to use either prior or approximate posterior depending upon what we would like the graph to do.

Unfortunately this doesn’t work because we’d like the user to be able to write a model independent of how inference might be performed. This means the model is given to us with the prior already connected.

Also, I’m using ed.MAP, so your hack would be a great help. Is there a way I can get hold of the new inference algorithm?

Sure: 1. Copy and paste the MAP class’ source code inside map.py. 2. Remove the two copy lines, e.g., replace z_copy = copy(z, dict_swap, scope=scope) with z_copy = z. 3. profit.

mkabra · July 3, 2017, 11:51am

Thanks!! That helps a lot!

dustin · July 4, 2017, 3:30am

Motivated by this discussion, I went ahead and added this special case to MAP: https://github.com/blei-lab/edward/pull/705.

Topic		Replies	Views
Model criticism API (ed.copy): possible to copy multiple model components at once?	1	727	April 10, 2017
Using `ed.copy()` twice on the same Tensor	0	759	November 16, 2017
Cannot copy PointMass random variables	2	1220	May 29, 2017
Saving ancestor variables in ancestral sampling	1	963	October 16, 2017
Error in inference.run() for a mixture model	7	2042	July 19, 2017

Limit the operations that are duplicated in random_variables.py/copy

Related topics