Variational Inference for Dirichlet Process Mixtures

Hi, I am very new to Edward and am teaching myself Dirichlet Process. Is there any way of directly performing VI on the DirichletProcess class in Edward?

Something as simple as:
inference = ed.KLqp({dp: qdp}, data={y: y_train})

I know this sounds completely wrong, but I couldn’t find a way of assigning variational distributions to the tensors responsible for stick-breaking.

Thanks,
Hugo

So I have implemented my Dirichlet Process Mixture Model as follows, but it only works for K (number of sticks) <=4. For anything more than 4, I get an error saying that my indexing for tf.gather is wrong; also, for K =4, I get infinite MSE, and -33 log-likelihood. It looks like my gradients are exploding. What is the standard practice of preventing exploding gradients in Edward?

Traceback (most recent call last):
File “Edward_DP_Baseline.py”, line 82, in
fit_truncated_DPM()
File “Edward_DP_Baseline.py”, line 72, in fit_truncated_DPM
inference.run(n_samples=5, n_iter=3000)
File “/usr/local/lib/python3.5/dist-packages/edward/inferences/inference.py”, line 144, in run
info_dict = self.update()
File “/usr/local/lib/python3.5/dist-packages/edward/inferences/variational_inference.py”, line 164, in update
_, t, loss = sess.run([self.train, self.increment_t, self.loss], feed_dict)
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 789, in run
run_metadata_ptr)
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 997, in _run
feed_dict_string, options, run_metadata)
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 1132, in _do_run
target_list, options, run_metadata)
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 5 which is outside the valid range of [0, 5). Label values: 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
[[Node: inference_139978766253640/0/Categorical_3/log_prob/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](inference_139978766253640/0/Categorical/inference_139978766253640/0/Categorical/logits/Log, StopGradient_3)]]

def fit_truncated_DPM():
#N = tf.placeholder(tf.int32)
N = 500

# Stick breaking representation of Dirichlet Process
alpha = Gamma(1.0, 1.0)
# Maximum number of sticks to break
K = 4

beta = Beta(tf.ones([K]), alpha * tf.ones([K]))
probs = Deterministic(beta * tf.concat([[1], tf.cumprod(1.0 - beta)[:-1]], 0))

sigma = InverseGamma(concentration=tf.ones([K, 1]), rate=tf.ones([K, 1]))
mu = Normal(loc=tf.zeros([K, 1]), scale=tf.ones([K, 1]))
alloc = Categorical(probs=tf.expand_dims(probs,0) * tf.ones([N, K]))

mixture_mu = tf.squeeze(tf.gather(mu, alloc))
mixture_sigma = tf.squeeze(tf.gather(sigma, alloc))

y = Normal(loc=mixture_mu, scale=mixture_sigma)

# Cavity Distributions
qalpha = GammaWithSoftplusConcentrationRate(tf.Variable(1.0), tf.Variable(1.0))
qbeta = Beta(tf.ones([K]), tf.nn.softplus(tf.Variable(tf.ones([K]))))
qmu = NormalWithSoftplusScale(loc=tf.Variable(tf.zeros([K, 1])), scale=tf.Variable(tf.zeros([K, 1])))
qsigma = InverseGammaWithSoftplusConcentrationRate(concentration=tf.Variable(tf.zeros([K, 1])), rate=tf.Variable(tf.zeros([K, 1])))
qalloc = Categorical(probs=tf.nn.softmax(tf.Variable(tf.zeros([N, K]))))

inference = ed.KLqp({alpha: qalpha, beta: qbeta, sigma: qsigma, mu: qmu, alloc: qalloc}, data={y: y_train[:N]})
inference.run(n_samples=5, n_iter=3000)

y_post = ed.copy(y, {alpha: qalpha, beta: qbeta, sigma: qsigma, mu: qmu, alloc: qalloc})

print('Train MSE:', ed.evaluate('mean_squared_error', data={y_post: y_train[:N]}))
print('Train Log-Likelihood:', ed.evaluate('log_likelihood', data={y_post: y_train[:N]}))

print('Test MSE:', ed.evaluate('mean_squared_error', data={y_post: y_test[:N]}))
print('Test Log-Likelihood:', ed.evaluate('log_likelihood', data={y_post: y_test[:N]}))

Cheers,
Hugo