Hi, thanks for Edward. I ran into Edward because I was trying to see if I could speed up the fitting of many mixed effects models over statsmodels or rpy2->lme4. I am pretty new to TensorFlow, but proficient in Python.
What I am doing is fitting tens of thousands of models, where the “design matrix” of fixed and random effects stays the same, and the only thing that differs each time is the dependent variable. Basically, my problem is equivalent to the mixed effects tutorial, but I am looking for guidance on how to correctly re-use my model/variables when I plug in a new dependent variable, if that is possible.
What I have been doing is:
- Initializing my model variables as in the tutorial
- For each column in my matrix of dependent variables, Y:
- set data[y] = np.array(Y[:,j])
- initialize a new KLqp with data and latent and run inference
My questions are:
Will the variables be “contaminated” from previous loops, if I initialize a new Inference object each time?
Is there a way to re-use the KLqp by initializing it once and just adding my new dependent variable? I see that there is a feed_dict in Inference.update, but if I understand it correctly, it is for iteratively fitting a single model with a data stream, not for “resetting” data.
On that note, I see there is a session.run(inference.reset), but if I try this, I get uninitialized variables errors, even if I put my whole data dict into inference.update. But I am not entirely clear on what “initializing” a variable even means in this context (zeroing it? loading it onto the GPU if one is being used?).
If anyone could point me in the right direction, it’d be greatly appreciated. The reason I am concerned about this is that it seems more time is being spent on (re)initializing the model each time than actually fitting it, and because we are getting in a batch of GPUs and it seems useless to shuttle the same design matrix data back and forth from the GPU thousands of times.