# Building a multinomial regression model

@demodw

Hi,

I am trying to create a multinomial regression model using Edward. The equivalent Stan code can be seen at https://gist.github.com/demodw/20d95f0fdd54624d14aa2983e14d71f5.

In my very naive attempt, I just copied the logistic example, and tried re-formatting it a bit. The not so successful attempt can be seen at https://gist.github.com/demodw/a5acafb4025d53f4d3f80f559dcce775. Naively, I tried to just create a vector, theta, which has the linear sum of coefficients (except for the reference group, which I set to index 0). However, this gives me the error,

ValueError: Shape (3,) must have rank 2

Any help or pointers would be much appreciated!

Cheers.

Here are some tips:

1. `Multinomial` is a class. Similar to SciPy, I recommend calling `multinomial` which is an instantiated object of that class, e.g., `multinomial.logpmf()`. (It doesn’t actually matter in this example though.) It also requires specifying the number of trials. So you should do `multinomial.logpmf(y, n=1, p=...)`.

2. You use `D=5` to generate the data but set `D=1` for the rest of the code.

3. You use `tf.nn.softmax` to convert from a real-vector to a probability vector. `tf.nn.softmax` requires a matrix as input, where the first dimension represents the batch size. For your example, you calculate the probability vectors one data point at a time. Therefore you should set `theta = tf.expand_dims(theta, 0)`.

Alternatively, you can perform the vectorized operation of `theta = tf.matmul(x, w) + b`, and then call

``````theta = tf.matmul(x, w) + b
log_lik = tf.reduce_sum(multinomial.logpmf(y, n=1, p=tf.nn.softmax(theta)))
``````
1. For your variational distribution, you specify distributions over `C-1` dimensions rather than `C` dimensions. That makes sense because it’s a simplex, so you only need `C-1` dimensions to get the last one. However, that doesn’t play well with `tf.nn.softmax` which assumes the input is not only a matrix of `[batch_size, probabilities]`, but that each row sums to 1. Therefore you have to input the final column representing `1 - tf.reduce_sum(...)` before passing it into `tf.nn.softmax`.

Alternatively, you can use the `ed.to_simplex()` utility function. It will convert a real-valued matrix of `[N, K-1]` to a matrix of probabilities `[N, K]`, where each row sums to 1.

Here’s what the final script looks like:

``````class HierarchicalSoftmax:
...
def log_prob(self, xs, zs):
...
theta = tf.matmul(x, w) + b
log_lik = tf.reduce_sum(multinomial.logpmf(y, n=1, p=ed.to_simplex(theta)))
return log_lik + log_prior

...

C = 3
N = 40  # num data points
D = 5  # num features

...

qw_mu = tf.Variable(tf.random_normal([D, C-1]))
qw_sigma = tf.nn.softplus(tf.Variable(tf.random_normal([D, C-1])))
qb_mu = tf.Variable(tf.random_normal([C-1]))
qb_sigma = tf.nn.softplus(tf.Variable(tf.random_normal([])))

...

inference.run(n_print=5)
``````
1 Like

@demodw

I see. That actually works! My only issue now is that I still have to construct theta data-point by data-point. In my data I have about 200.000 groups (what I call D, combinations, in the Stan model), 1,3 million data points, and a few million parameters, so I do not think it is feasible to use a design matrix. Any tips?