Multinomial-Logistic Normal Model, Probably My Fault


#1

First off: Edward looks awesome. Excited to be trying it out. Thanks for all the good work!

I am trying Edward because I have a model that I want to try to speed up / make use of parallelization compared to Stan. The model(s) are all based on Multinomial-Logistic Normal likelihoods in various forms (as in Correlated Topic-Models). In particular I like to use a slightly different transform than the softmax. I use something called the inverse ILR transform which can be written as:

def clo(x):
    return x/np.sum(x, axis=1, keepdims=True)
def ILRInv(x, V):
    return clo(np.dot(x,np.transpose(V)))

I have tried to write a simple Latent linear regression model on the ILR transformed multinomial parameters but I am finding that HMC results in a total zero for the acceptance rate. I am guessing that my issue either results from the way I have tried to code this inverse ILR transform or what is probably a very foolish attempt at dealing with the problem initializing multinomial nodes en-batch (based on what I read: https://github.com/blei-lab/edward/issues/748 and Building a multinomial regression model). My observations are multinomial so I am just using them in the likelihood of my model so I tried to use the trick from the edward api instructions by setting the β€œvalue” parameter of the random variable.

I have included my code below. I am pretty sure this is just my fault but I would really appreciate some advice.

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import edward as ed
from edward.models import WishartCholesky
from edward.models import MultivariateNormalDiag, MultivariateNormalTriL
from edward.models import Multinomial, Empirical 

sess = tf.Session()

# Simulate some data and parameters
N = 80 # Number of samples
D = 4 # Number of Multinomial Categories
P = 1 # Number of parameters for regression

# Simulate data coming from two different parts of the simplex. 
# No logistic-normal noise yet. 
Y_sim = np.concatenate((
    np.random.multinomial(1000, [.1, .2, .3, .4], size = int(N/2)), 
    np.random.multinomial(1000, [.1, .4, .3, .2], size = int(N/2))), 
    axis=0)
X_sim = np.array([0 for i in range(int(N/2))] + [1 for i in range(int(N/2))])
X_sim = X_sim.reshape((N, P))
n_sim = Y_sim.sum(axis=1)

# Create a default contrast matrix for the Inverse ILR transform. 
V_sim = np.array([[-0.7071068,  0.4082483,  0.2886751],
                  [0.0000000, -0.8164966,  0.2886751], 
                  [0.0000000,  0.0000000, -0.8660254], 
                  [0.7071068,  0.4082483,  0.2886751]])

# Observations/Input stuff....
V = tf.placeholder(tf.float32, [D, D-1])
X = tf.placeholder(tf.float32, [N, P])
n = tf.placeholder(tf.float32, [N])

# Priors
beta = MultivariateNormalDiag(loc=tf.zeros([P, D-1], tf.float32), 
                                        scale_diag=tf.ones([D-1], tf.float32))
sigma = WishartCholesky(df = D+3, scale = tf.diag(tf.ones([D-1], tf.float32)), 
                       cholesky_input_output_matrices=True)

# Likelihood Model
eta = MultivariateNormalTriL(loc = tf.matmul(X, beta, transpose_b=False), scale_tril = sigma)
pi1 = tf.matmul(eta, V, transpose_b = True)
pi2 = tf.multiply(pi1, tf.divide(1, tf.reshape(tf.reduce_sum(pi1, 1), [N, 1])))
Y = Multinomial(total_count = n, probs=pi2, value = tf.zeros_like(pi2))

# Inference
T = 100
qbeta = Empirical(params=tf.Variable(tf.zeros([T, P, D-1])))
qsigma = Empirical(params=tf.Variable(tf.zeros([T, D-1, D-1])))
# Commented out next line because I acctually want to marginalize over eta
# qeta = Empirical(params = tf.Variable(tf.zeros([T, N, D-1])))

inference = ed.HMC({beta: qbeta, sigma: qsigma}, 
                         {Y: Y_sim, n: n_sim, X: X_sim, V: V_sim})
inference.run(step_size = 0.003, n_steps=5)

100/100 [100%] β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ Elapsed: 9s | Acceptance Rate: 0.000

sess = ed.get_session()
sess.run(qbeta.params)

As you can probably expect, this returns a 3D array of just zeroes… :frowning:


#2

So the issue seemed to be how I was dealing with batch sampling from the multinomial.

The following code snippet seemed to fix the problem.

def _sample_n(self, n=1, seed=None):
    # define Python function which returns samples as a Numpy array
    def np_sample(p, n):
        return multinomial.rvs(p=p, n=n, random_state=seed).astype(np.float32)

    # wrap python function as tensorflow op
    val = tf.py_func(np_sample, [self.probs, n], [tf.float32])[0]
    # set shape from unknown shape
    batch_event_shape = self.batch_shape.concatenate(self.event_shape)
    shape = tf.concat(
        [tf.expand_dims(n, 0), tf.convert_to_tensor(batch_event_shape)], 0)
    val = tf.reshape(val, shape)
    return val
Multinomial._sample_n = _sample_n

Numerical issues with Multinomial
#3

Just as a heads up, this is not ideal, since you don’t actually want to be explicitly sampling from the Multinomial distribution. It better to only be evaluating the log-likelihood.

The way to do this is to do what you were doing before.
Also note that passing in raw probabilities will incur underflow, you’ll want to pass in logits instead.
I posted a simplified example of this without the normal regression here