Multinomial-Logistic Normal Model, Probably My Fault

jsilve24 · November 10, 2017, 10:11pm

First off: Edward looks awesome. Excited to be trying it out. Thanks for all the good work!

I am trying Edward because I have a model that I want to try to speed up / make use of parallelization compared to Stan. The model(s) are all based on Multinomial-Logistic Normal likelihoods in various forms (as in Correlated Topic-Models). In particular I like to use a slightly different transform than the softmax. I use something called the inverse ILR transform which can be written as:

def clo(x):
    return x/np.sum(x, axis=1, keepdims=True)
def ILRInv(x, V):
    return clo(np.dot(x,np.transpose(V)))

I have tried to write a simple Latent linear regression model on the ILR transformed multinomial parameters but I am finding that HMC results in a total zero for the acceptance rate. I am guessing that my issue either results from the way I have tried to code this inverse ILR transform or what is probably a very foolish attempt at dealing with the problem initializing multinomial nodes en-batch (based on what I read: https://github.com/blei-lab/edward/issues/748 and Building a multinomial regression model). My observations are multinomial so I am just using them in the likelihood of my model so I tried to use the trick from the edward api instructions by setting the “value” parameter of the random variable.

I have included my code below. I am pretty sure this is just my fault but I would really appreciate some advice.

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import edward as ed
from edward.models import WishartCholesky
from edward.models import MultivariateNormalDiag, MultivariateNormalTriL
from edward.models import Multinomial, Empirical 

sess = tf.Session()

# Simulate some data and parameters
N = 80 # Number of samples
D = 4 # Number of Multinomial Categories
P = 1 # Number of parameters for regression

# Simulate data coming from two different parts of the simplex. 
# No logistic-normal noise yet. 
Y_sim = np.concatenate((
    np.random.multinomial(1000, [.1, .2, .3, .4], size = int(N/2)), 
    np.random.multinomial(1000, [.1, .4, .3, .2], size = int(N/2))), 
    axis=0)
X_sim = np.array([0 for i in range(int(N/2))] + [1 for i in range(int(N/2))])
X_sim = X_sim.reshape((N, P))
n_sim = Y_sim.sum(axis=1)

# Create a default contrast matrix for the Inverse ILR transform. 
V_sim = np.array([[-0.7071068,  0.4082483,  0.2886751],
                  [0.0000000, -0.8164966,  0.2886751], 
                  [0.0000000,  0.0000000, -0.8660254], 
                  [0.7071068,  0.4082483,  0.2886751]])

# Observations/Input stuff....
V = tf.placeholder(tf.float32, [D, D-1])
X = tf.placeholder(tf.float32, [N, P])
n = tf.placeholder(tf.float32, [N])

# Priors
beta = MultivariateNormalDiag(loc=tf.zeros([P, D-1], tf.float32), 
                                        scale_diag=tf.ones([D-1], tf.float32))
sigma = WishartCholesky(df = D+3, scale = tf.diag(tf.ones([D-1], tf.float32)), 
                       cholesky_input_output_matrices=True)

# Likelihood Model
eta = MultivariateNormalTriL(loc = tf.matmul(X, beta, transpose_b=False), scale_tril = sigma)
pi1 = tf.matmul(eta, V, transpose_b = True)
pi2 = tf.multiply(pi1, tf.divide(1, tf.reshape(tf.reduce_sum(pi1, 1), [N, 1])))
Y = Multinomial(total_count = n, probs=pi2, value = tf.zeros_like(pi2))

# Inference
T = 100
qbeta = Empirical(params=tf.Variable(tf.zeros([T, P, D-1])))
qsigma = Empirical(params=tf.Variable(tf.zeros([T, D-1, D-1])))
# Commented out next line because I acctually want to marginalize over eta
# qeta = Empirical(params = tf.Variable(tf.zeros([T, N, D-1])))

inference = ed.HMC({beta: qbeta, sigma: qsigma}, 
                         {Y: Y_sim, n: n_sim, X: X_sim, V: V_sim})
inference.run(step_size = 0.003, n_steps=5)

100/100 [100%] ██████████████████████████████ Elapsed: 9s | Acceptance Rate: 0.000

sess = ed.get_session()
sess.run(qbeta.params)

As you can probably expect, this returns a 3D array of just zeroes…

jsilve24 · November 11, 2017, 10:39pm

So the issue seemed to be how I was dealing with batch sampling from the multinomial.

The following code snippet seemed to fix the problem.

def _sample_n(self, n=1, seed=None):
    # define Python function which returns samples as a Numpy array
    def np_sample(p, n):
        return multinomial.rvs(p=p, n=n, random_state=seed).astype(np.float32)

    # wrap python function as tensorflow op
    val = tf.py_func(np_sample, [self.probs, n], [tf.float32])[0]
    # set shape from unknown shape
    batch_event_shape = self.batch_shape.concatenate(self.event_shape)
    shape = tf.concat(
        [tf.expand_dims(n, 0), tf.convert_to_tensor(batch_event_shape)], 0)
    val = tf.reshape(val, shape)
    return val
Multinomial._sample_n = _sample_n

mortonjt · January 3, 2018, 5:51pm

Just as a heads up, this is not ideal, since you don’t actually want to be explicitly sampling from the Multinomial distribution. It better to only be evaluating the log-likelihood.

The way to do this is to do what you were doing before.
Also note that passing in raw probabilities will incur underflow, you’ll want to pass in logits instead.
I posted a simplified example of this without the normal regression here

Topic		Replies	Views
Building a multinomial regression model	2	1672	July 3, 2017
Multinomial on convolutionnal model returns nans (equivalent code with Pyro)	1	939	June 6, 2018
Bayesian regression	4	1199	June 26, 2017
Deep Generative Models in Edward	1	924	July 1, 2017
0% Acceptance Rate when using Categorical Distribution with HMC	7	1786	April 28, 2017

Multinomial-Logistic Normal Model, Probably My Fault

Related topics