Numerical issues with Multinomial

This may be a tensorflow question, but I’d like to first get a better understand how Edward uses the multinomial.

Right now, I’m building off of the Multinomial Logistic Normal here. Specifically, I’m specifying the _sample_n function as follows (since Tensorflow does not have batch multinomial sampling enabled yet).

def _sample_n(self, n=1, seed=None):
    # define Python function which returns samples as a Numpy array
    def np_sample(p, n):
        return multinomial.rvs(p=p, n=n, random_state=seed).astype(np.float32)

    # wrap python function as tensorflow op
    val = tf.py_func(np_sample, [self.probs, n], [tf.float32])[0]
    # set shape from unknown shape
    batch_event_shape = self.batch_shape.concatenate(self.event_shape)
    shape = tf.concat(
        [tf.expand_dims(n, 0), tf.convert_to_tensor(batch_event_shape)], 0)
    val = tf.reshape(val, shape)
    return val
Multinomial._sample_n = _sample_n

It does work well for small datasets (i.e. 100 x 2000) when using SGLD and MAP. However, numerical issues start to seep in when I try to scale this to larger datasets (100 x 10000). I also tried Variational Inference, but that doesn’t be working either.

Looking at the outputs, I think the issues are arising from the Multinomial sampling - since the tfdbg is showing that all of the nans and infs are arising within the Multinomial sampling step. There have been a few issues below that have pointed to numerical issues within Tensorflow

Here’s the big question – how is Edward actually sampling from the Multinomial distribution? Are counts actually being drawn from the Multinomial? Or is only the likelihood of the multinomial being evaluated? Is the _sample_n example above prone to underflow issues? And would implementing the gumbel arg max trick as hinted here provide a means of addressing this issue?

Any insights or even links to relevant source code within Edward or Tensorflow would be really appreciated!

Just as a heads up - I got something working using the MAP estimator. It seems that once I reduce the learning rate down sufficiently, I can get something that looks sane.

Here’s the code I’m using to do this.

## Model
# N = number of samples
# p = number of covariates in regression model
# D = number of features within each sample
# psi = predefined orthonormal basis to ensure identifiability when performing the softmax transform
G = tf.placeholder(tf.float32, [N, p])
B = Normal(loc=tf.zeros([p, D-1]), 
           scale=tf.ones([p, D-1]))
v = tf.matmul(G, B) 
eta = tf.nn.log_softmax(tf.matmul(v, psi))
Y = Multinomial(total_count=n, logits=eta, 
                value=tf.zeros([N, D], dtype=tf.float32))

## Inference
qB = PointMass(params=tf.Variable(tf.zeros([p, D-1])))
inference = ed.MAP(
    {B: qB},     
    data={G: G_data, Y: y_data}
optimizer = tf.train.AdamOptimizer(5e-4, 
                                   beta2=0.999), optimizer=optimizer)

Still having a bit of trouble getting the other inference techniques to work. Could be due to difficulties handling the Multinomial. But I’m not entirely sure.

1 Like