# Numerical issues with Multinomial

This may be a tensorflow question, but I’d like to first get a better understand how Edward uses the multinomial.

Right now, I’m building off of the Multinomial Logistic Normal here. Specifically, I’m specifying the `_sample_n` function as follows (since Tensorflow does not have batch multinomial sampling enabled yet).

``````def _sample_n(self, n=1, seed=None):
# define Python function which returns samples as a Numpy array
def np_sample(p, n):
return multinomial.rvs(p=p, n=n, random_state=seed).astype(np.float32)

# wrap python function as tensorflow op
val = tf.py_func(np_sample, [self.probs, n], [tf.float32])
# set shape from unknown shape
batch_event_shape = self.batch_shape.concatenate(self.event_shape)
shape = tf.concat(
[tf.expand_dims(n, 0), tf.convert_to_tensor(batch_event_shape)], 0)
val = tf.reshape(val, shape)
return val
Multinomial._sample_n = _sample_n
``````

It does work well for small datasets (i.e. 100 x 2000) when using SGLD and MAP. However, numerical issues start to seep in when I try to scale this to larger datasets (100 x 10000). I also tried Variational Inference, but that doesn’t be working either.

Looking at the outputs, I think the issues are arising from the Multinomial sampling - since the tfdbg is showing that all of the nans and infs are arising within the Multinomial sampling step. There have been a few issues below that have pointed to numerical issues within Tensorflow

Here’s the big question – how is Edward actually sampling from the Multinomial distribution? Are counts actually being drawn from the Multinomial? Or is only the likelihood of the multinomial being evaluated? Is the `_sample_n` example above prone to underflow issues? And would implementing the gumbel arg max trick as hinted here provide a means of addressing this issue?

Any insights or even links to relevant source code within Edward or Tensorflow would be really appreciated!

Just as a heads up - I got something working using the MAP estimator. It seems that once I reduce the learning rate down sufficiently, I can get something that looks sane.

Here’s the code I’m using to do this.

``````## Model
# N = number of samples
# p = number of covariates in regression model
# D = number of features within each sample
# psi = predefined orthonormal basis to ensure identifiability when performing the softmax transform
G = tf.placeholder(tf.float32, [N, p])
B = Normal(loc=tf.zeros([p, D-1]),
scale=tf.ones([p, D-1]))
v = tf.matmul(G, B)
eta = tf.nn.log_softmax(tf.matmul(v, psi))
Y = Multinomial(total_count=n, logits=eta,
value=tf.zeros([N, D], dtype=tf.float32))

## Inference
qB = PointMass(params=tf.Variable(tf.zeros([p, D-1])))
inference = ed.MAP(
{B: qB},
data={G: G_data, Y: y_data}
)