Numerical issues with Multinomial

mortonjt · December 13, 2017, 9:41pm

This may be a tensorflow question, but I’d like to first get a better understand how Edward uses the multinomial.

Right now, I’m building off of the Multinomial Logistic Normal here. Specifically, I’m specifying the _sample_n function as follows (since Tensorflow does not have batch multinomial sampling enabled yet).

def _sample_n(self, n=1, seed=None):
    # define Python function which returns samples as a Numpy array
    def np_sample(p, n):
        return multinomial.rvs(p=p, n=n, random_state=seed).astype(np.float32)

    # wrap python function as tensorflow op
    val = tf.py_func(np_sample, [self.probs, n], [tf.float32])[0]
    # set shape from unknown shape
    batch_event_shape = self.batch_shape.concatenate(self.event_shape)
    shape = tf.concat(
        [tf.expand_dims(n, 0), tf.convert_to_tensor(batch_event_shape)], 0)
    val = tf.reshape(val, shape)
    return val
Multinomial._sample_n = _sample_n

It does work well for small datasets (i.e. 100 x 2000) when using SGLD and MAP. However, numerical issues start to seep in when I try to scale this to larger datasets (100 x 10000). I also tried Variational Inference, but that doesn’t be working either.

Looking at the outputs, I think the issues are arising from the Multinomial sampling - since the tfdbg is showing that all of the nans and infs are arising within the Multinomial sampling step. There have been a few issues below that have pointed to numerical issues within Tensorflow

Here’s the big question – how is Edward actually sampling from the Multinomial distribution? Are counts actually being drawn from the Multinomial? Or is only the likelihood of the multinomial being evaluated? Is the _sample_n example above prone to underflow issues? And would implementing the gumbel arg max trick as hinted here provide a means of addressing this issue?

Any insights or even links to relevant source code within Edward or Tensorflow would be really appreciated!

mortonjt · January 3, 2018, 5:47pm

Just as a heads up - I got something working using the MAP estimator. It seems that once I reduce the learning rate down sufficiently, I can get something that looks sane.

Here’s the code I’m using to do this.

## Model
# N = number of samples
# p = number of covariates in regression model
# D = number of features within each sample
# psi = predefined orthonormal basis to ensure identifiability when performing the softmax transform
G = tf.placeholder(tf.float32, [N, p])
B = Normal(loc=tf.zeros([p, D-1]), 
           scale=tf.ones([p, D-1]))
v = tf.matmul(G, B) 
eta = tf.nn.log_softmax(tf.matmul(v, psi))
Y = Multinomial(total_count=n, logits=eta, 
                value=tf.zeros([N, D], dtype=tf.float32))

## Inference
qB = PointMass(params=tf.Variable(tf.zeros([p, D-1])))
inference = ed.MAP(
    {B: qB},     
    data={G: G_data, Y: y_data}
)
optimizer = tf.train.AdamOptimizer(5e-4, 
                                   beta1=0.9,
                                   beta2=0.999)
inference.run(n_iter=5000, optimizer=optimizer)

Still having a bit of trouble getting the other inference techniques to work. Could be due to difficulties handling the Multinomial. But I’m not entirely sure.

Topic		Replies	Views
Multinomial on convolutionnal model returns nans (equivalent code with Pyro)	1	939	June 6, 2018
A little example on Dirichlet/Multinomial inference	1	945	June 6, 2018
Building a multinomial regression model	2	1672	July 3, 2017
Multinomial-Logistic Normal Model, Probably My Fault	2	3127	January 3, 2018
Beta binomial example	5	1837	August 14, 2017

Numerical issues with Multinomial

Related topics