Very slow model creation

shayantabrizi · January 27, 2018, 5:42pm

I’m trying to just define an Latent Dirichlet Allocation mode with the following code. However, it’s verrry slow. Running it with only 4 documents takes about a minute to finish (No data, no learning, no inference, just defining the model). Is that normal? Then, what should I do if I want to use it on a real dataset (e.g., 300,000 documents) and learn it? It seems that the bottle-neck is at the Categorical() method.

import tensorflow as tf
import numpy as np
from edward.models import Categorical, Dirichlet

D = 4  # number of documents
N = [300, 213, 300, 300]  # words per doc
K = 10  # number of topics
V = 1000  # vocabulary size

theta = Dirichlet(tf.zeros([D, K]) + 0.1)
phi = Dirichlet(tf.zeros([K, V]) + 0.05)
z=np.empty(len(N),dtype=object)
w=np.empty(len(N),dtype=object)
for i in range(len(N)):
    z[i]=np.zeros(N[i],dtype=object)
    w[i]=np.zeros(N[i],dtype=object)
for d in range(D):
    for n in range(N[d]):
        z[d][n] = Categorical(theta[d, :])
        w[d][n] = Categorical(phi[z[d][n], :])

shayantabrizi · January 28, 2018, 11:25am

I have removed the unnecessary parts from the code. The following simple code takes about 45 seconds! What’s the problem?

from edward.models import Categorical

for n in range(2000):
    Categorical([.2,.2,.6])

For making the problem more clear, the following code takes less than a millisecond:

for n in range(2000):
    i=n

and the following takes about 17 seconds (it’s less than 45 but is still a lot for just defining some variables):

from tensorflow.python.ops.distributions.categorical import Categorical

for n in range(2000):
    Categorical([.2,.2,.6])

Topic		Replies	Views
How to do this basic case thing?	1	704	May 7, 2018
Confused by error message from inference.run() for LDA with KLqp	5	2671	April 10, 2018
Inference on "temporal LDA"	0	768	April 23, 2018
Bayesian Model Combination (Kim, Ghahramani 2012)	3	1442	August 13, 2017
Variational EM for Independent Factor Analysis	7	1826	March 28, 2017

Very slow model creation

Related topics