Here are some tips:

Multinomial
is a class. Similar to SciPy, I recommend calling multinomial
which is an instantiated object of that class, e.g., multinomial.logpmf()
. (It doesn’t actually matter in this example though.) It also requires specifying the number of trials. So you should do multinomial.logpmf(y, n=1, p=...)
.

You use D=5
to generate the data but set D=1
for the rest of the code.

You use tf.nn.softmax
to convert from a realvector to a probability vector. tf.nn.softmax
requires a matrix as input, where the first dimension represents the batch size. For your example, you calculate the probability vectors one data point at a time. Therefore you should set theta = tf.expand_dims(theta, 0)
.
Alternatively, you can perform the vectorized operation of theta = tf.matmul(x, w) + b
, and then call
theta = tf.matmul(x, w) + b
log_lik = tf.reduce_sum(multinomial.logpmf(y, n=1, p=tf.nn.softmax(theta)))

For your variational distribution, you specify distributions over C1
dimensions rather than C
dimensions. That makes sense because it’s a simplex, so you only need C1
dimensions to get the last one. However, that doesn’t play well with tf.nn.softmax
which assumes the input is not only a matrix of [batch_size, probabilities]
, but that each row sums to 1. Therefore you have to input the final column representing 1  tf.reduce_sum(...)
before passing it into tf.nn.softmax
.
Alternatively, you can use the ed.to_simplex()
utility function. It will convert a realvalued matrix of [N, K1]
to a matrix of probabilities [N, K]
, where each row sums to 1.
Here’s what the final script looks like:
class HierarchicalSoftmax:
...
def log_prob(self, xs, zs):
...
theta = tf.matmul(x, w) + b
log_lik = tf.reduce_sum(multinomial.logpmf(y, n=1, p=ed.to_simplex(theta)))
return log_lik + log_prior
...
C = 3
N = 40 # num data points
D = 5 # num features
...
qw_mu = tf.Variable(tf.random_normal([D, C1]))
qw_sigma = tf.nn.softplus(tf.Variable(tf.random_normal([D, C1])))
qb_mu = tf.Variable(tf.random_normal([C1]))
qb_sigma = tf.nn.softplus(tf.Variable(tf.random_normal([])))
...
inference.run(n_print=5)