I’m trying to just define an Latent Dirichlet Allocation mode with the following code. However, it’s verrry slow. Running it with only 4 documents takes about a minute to finish (No data, no learning, no inference, just defining the model). Is that normal? Then, what should I do if I want to use it on a real dataset (e.g., 300,000 documents) and learn it? It seems that the bottle-neck is at the Categorical() method.

```
import tensorflow as tf
import numpy as np
from edward.models import Categorical, Dirichlet
D = 4 # number of documents
N = [300, 213, 300, 300] # words per doc
K = 10 # number of topics
V = 1000 # vocabulary size
theta = Dirichlet(tf.zeros([D, K]) + 0.1)
phi = Dirichlet(tf.zeros([K, V]) + 0.05)
z=np.empty(len(N),dtype=object)
w=np.empty(len(N),dtype=object)
for i in range(len(N)):
z[i]=np.zeros(N[i],dtype=object)
w[i]=np.zeros(N[i],dtype=object)
for d in range(D):
for n in range(N[d]):
z[d][n] = Categorical(theta[d, :])
w[d][n] = Categorical(phi[z[d][n], :])
```