This may be a tensorflow question, but I’d like to first get a better understand how Edward uses the multinomial.
Right now, I’m building off of the Multinomial Logistic Normal here. Specifically, I’m specifying the _sample_n
function as follows (since Tensorflow does not have batch multinomial sampling enabled yet).
def _sample_n(self, n=1, seed=None):
# define Python function which returns samples as a Numpy array
def np_sample(p, n):
return multinomial.rvs(p=p, n=n, random_state=seed).astype(np.float32)
# wrap python function as tensorflow op
val = tf.py_func(np_sample, [self.probs, n], [tf.float32])[0]
# set shape from unknown shape
batch_event_shape = self.batch_shape.concatenate(self.event_shape)
shape = tf.concat(
[tf.expand_dims(n, 0), tf.convert_to_tensor(batch_event_shape)], 0)
val = tf.reshape(val, shape)
return val
Multinomial._sample_n = _sample_n
It does work well for small datasets (i.e. 100 x 2000) when using SGLD and MAP. However, numerical issues start to seep in when I try to scale this to larger datasets (100 x 10000). I also tried Variational Inference, but that doesn’t be working either.
Looking at the outputs, I think the issues are arising from the Multinomial sampling - since the tfdbg is showing that all of the nans and infs are arising within the Multinomial sampling step. There have been a few issues below that have pointed to numerical issues within Tensorflow
Here’s the big question – how is Edward actually sampling from the Multinomial distribution? Are counts actually being drawn from the Multinomial? Or is only the likelihood of the multinomial being evaluated? Is the _sample_n
example above prone to underflow issues? And would implementing the gumbel arg max trick as hinted here provide a means of addressing this issue?
Any insights or even links to relevant source code within Edward or Tensorflow would be really appreciated!