Very slow model creation

I’m trying to just define an Latent Dirichlet Allocation mode with the following code. However, it’s verrry slow. Running it with only 4 documents takes about a minute to finish (No data, no learning, no inference, just defining the model). Is that normal? Then, what should I do if I want to use it on a real dataset (e.g., 300,000 documents) and learn it? It seems that the bottle-neck is at the Categorical() method.

import tensorflow as tf
import numpy as np
from edward.models import Categorical, Dirichlet

D = 4  # number of documents
N = [300, 213, 300, 300]  # words per doc
K = 10  # number of topics
V = 1000  # vocabulary size

theta = Dirichlet(tf.zeros([D, K]) + 0.1)
phi = Dirichlet(tf.zeros([K, V]) + 0.05)
z=np.empty(len(N),dtype=object)
w=np.empty(len(N),dtype=object)
for i in range(len(N)):
    z[i]=np.zeros(N[i],dtype=object)
    w[i]=np.zeros(N[i],dtype=object)
for d in range(D):
    for n in range(N[d]):
        z[d][n] = Categorical(theta[d, :])
        w[d][n] = Categorical(phi[z[d][n], :])

I have removed the unnecessary parts from the code. The following simple code takes about 45 seconds! What’s the problem?

from edward.models import Categorical

for n in range(2000):
    Categorical([.2,.2,.6])

For making the problem more clear, the following code takes less than a millisecond:

for n in range(2000):
    i=n

and the following takes about 17 seconds (it’s less than 45 but is still a lot for just defining some variables):

from tensorflow.python.ops.distributions.categorical import Categorical

for n in range(2000):
    Categorical([.2,.2,.6])