I’m trying to just define an Latent Dirichlet Allocation mode with the following code. However, it’s verrry slow. Running it with only 4 documents takes about a minute to finish (No data, no learning, no inference, just defining the model). Is that normal? Then, what should I do if I want to use it on a real dataset (e.g., 300,000 documents) and learn it? It seems that the bottle-neck is at the Categorical() method.
import tensorflow as tf import numpy as np from edward.models import Categorical, Dirichlet D = 4 # number of documents N = [300, 213, 300, 300] # words per doc K = 10 # number of topics V = 1000 # vocabulary size theta = Dirichlet(tf.zeros([D, K]) + 0.1) phi = Dirichlet(tf.zeros([K, V]) + 0.05) z=np.empty(len(N),dtype=object) w=np.empty(len(N),dtype=object) for i in range(len(N)): z[i]=np.zeros(N[i],dtype=object) w[i]=np.zeros(N[i],dtype=object) for d in range(D): for n in range(N[d]): z[d][n] = Categorical(theta[d, :]) w[d][n] = Categorical(phi[z[d][n], :])