This may be a tensorflow question, but I’d like to first get a better understand how Edward uses the multinomial.
Right now, I’m building off of the Multinomial Logistic Normal here. Specifically, I’m specifying the
_sample_n function as follows (since Tensorflow does not have batch multinomial sampling enabled yet).
def _sample_n(self, n=1, seed=None): # define Python function which returns samples as a Numpy array def np_sample(p, n): return multinomial.rvs(p=p, n=n, random_state=seed).astype(np.float32) # wrap python function as tensorflow op val = tf.py_func(np_sample, [self.probs, n], [tf.float32]) # set shape from unknown shape batch_event_shape = self.batch_shape.concatenate(self.event_shape) shape = tf.concat( [tf.expand_dims(n, 0), tf.convert_to_tensor(batch_event_shape)], 0) val = tf.reshape(val, shape) return val Multinomial._sample_n = _sample_n
It does work well for small datasets (i.e. 100 x 2000) when using SGLD and MAP. However, numerical issues start to seep in when I try to scale this to larger datasets (100 x 10000). I also tried Variational Inference, but that doesn’t be working either.
Looking at the outputs, I think the issues are arising from the Multinomial sampling - since the tfdbg is showing that all of the nans and infs are arising within the Multinomial sampling step. There have been a few issues below that have pointed to numerical issues within Tensorflow
Here’s the big question – how is Edward actually sampling from the Multinomial distribution? Are counts actually being drawn from the Multinomial? Or is only the likelihood of the multinomial being evaluated? Is the
_sample_n example above prone to underflow issues? And would implementing the gumbel arg max trick as hinted here provide a means of addressing this issue?
Any insights or even links to relevant source code within Edward or Tensorflow would be really appreciated!