Implementing mixed membership models

victorarodri · November 9, 2017, 8:11pm

Hi All,

I’m trying to implement an LDA-like mixed membership model in edward. I’ve seen several people attempt to run Gibbs sampling and KLqp on the version of LDA described in Dustin’s ICLR 2017 paper. Some have tried to do the same with their own versions of LDA. All examples use toy data, and no one claims to have obtained good results. Earlier this year it was suggested that ParamMixture should be used to implement LDA, so maybe this is the way to go. However, I’m stuck on how to embed ParamMixture within the structure of a mixed membership model. It seems straightforward at first, (naively, just put the call to ParamMixture within a loop over your corpus), but things break down when you start calling edward’s inference methods (e.g ed.Gibbs()). Below is my attempt at setting up the model.

alpha = tf.ones(K) * 0.1
eta = tf.ones(V) * 0.01

theta = Dirichlet(concentration=alpha, 
                  sample_shape=D)

beta = Dirichlet(concentration=eta, 
                 sample_shape=K)

W = [0] * D
for d in range(D):
  W[d] = ParamMixture(mixing_weights=tf.gather(theta, d), 
                      component_params={'probs': beta},
                      component_dist=Categorical,
                      sample_shape=N[d],
                      validate_args=True)

I’ve also seen claims that mixed memberships models scalable to real data would have to wait until conjugacy had been fully integrated into edward via ed.complete_conditional(). Correct me if I’m wrong, but it seems like this function has already been folded into Gibbs sampling. Otherwise, I’m not sure how I would leverage this functionality to get my models off the ground.

Any help or guidance or getting LDA up and running would be much appreciated.

dustin · November 9, 2017, 10:44pm

The LDA version in the ICLR paper is written for instructional purposes. A real-world version would vectorize.

This is correct. The symbolic algebra is primitive and will not work. Hand-writing the proposals to ed.Gibbs will work. Alternatively, if you write your own proposal distributions you may not even want to throw them into ed.Gibbs but instead handle your own scheduling like the for loop in examples/mixture_gaussian_gibbs.py.

Topic		Replies	Views
LDA with collapsed Gibbs Sampling	0	1588	November 14, 2018
Mixture of Linear Regression: How?	3	1564	March 27, 2018
Error in inference.run() for a mixture model	7	2027	July 19, 2017
DPM model for clustering	8	4616	August 25, 2017
Dirichlet Process Mixture Models (DPMM) with hierarchical structure	4	4538	February 4, 2018

Implementing mixed membership models

Related topics