moved from Building a multinomial regression model · Issue #265 · blei-lab/edward · GitHub.
@demodw
Hi,
I am trying to create a multinomial regression model using Edward. The equivalent Stan code can be seen at gist:20d95f0fdd54624d14aa2983e14d71f5 · GitHub.
In my very naive attempt, I just copied the logistic example, and tried re-formatting it a bit. The not so successful attempt can be seen at gist:a5acafb4025d53f4d3f80f559dcce775 · GitHub. Naively, I tried to just create a vector, theta, which has the linear sum of coefficients (except for the reference group, which I set to index 0). However, this gives me the error,
ValueError: Shape (3,) must have rank 2
Any help or pointers would be much appreciated!
Cheers.
Here are some tips:
-
Multinomial
is a class. Similar to SciPy, I recommend calling multinomial
which is an instantiated object of that class, e.g., multinomial.logpmf()
. (It doesn’t actually matter in this example though.) It also requires specifying the number of trials. So you should do multinomial.logpmf(y, n=1, p=...)
.
-
You use D=5
to generate the data but set D=1
for the rest of the code.
-
You use tf.nn.softmax
to convert from a real-vector to a probability vector. tf.nn.softmax
requires a matrix as input, where the first dimension represents the batch size. For your example, you calculate the probability vectors one data point at a time. Therefore you should set theta = tf.expand_dims(theta, 0)
.
Alternatively, you can perform the vectorized operation of theta = tf.matmul(x, w) + b
, and then call
theta = tf.matmul(x, w) + b
log_lik = tf.reduce_sum(multinomial.logpmf(y, n=1, p=tf.nn.softmax(theta)))
-
For your variational distribution, you specify distributions over C-1
dimensions rather than C
dimensions. That makes sense because it’s a simplex, so you only need C-1
dimensions to get the last one. However, that doesn’t play well with tf.nn.softmax
which assumes the input is not only a matrix of [batch_size, probabilities]
, but that each row sums to 1. Therefore you have to input the final column representing 1 - tf.reduce_sum(...)
before passing it into tf.nn.softmax
.
Alternatively, you can use the ed.to_simplex()
utility function. It will convert a real-valued matrix of [N, K-1]
to a matrix of probabilities [N, K]
, where each row sums to 1.
Here’s what the final script looks like:
class HierarchicalSoftmax:
...
def log_prob(self, xs, zs):
...
theta = tf.matmul(x, w) + b
log_lik = tf.reduce_sum(multinomial.logpmf(y, n=1, p=ed.to_simplex(theta)))
return log_lik + log_prior
...
C = 3
N = 40 # num data points
D = 5 # num features
...
qw_mu = tf.Variable(tf.random_normal([D, C-1]))
qw_sigma = tf.nn.softplus(tf.Variable(tf.random_normal([D, C-1])))
qb_mu = tf.Variable(tf.random_normal([C-1]))
qb_sigma = tf.nn.softplus(tf.Variable(tf.random_normal([])))
...
inference.run(n_print=5)
1 Like