Saving ancestor variables in ancestral sampling


#1

Hi,

I’m trying to write a utility where one can “freeze” a subset of latent variables during inference, for debugging problems with learning data sampled from a model. So I want to sample a given variable z, save it, pass it on to the rest of the model and sample the observed variable x, and then perform inference where z is treated like another observed variable. The rationale is that if inference works with a particular variable observed rather than latent, then the original problem stems from that variable.

In addition, I want to do inference on mini-batches. For example, in a Gaussian mixture model, I want to sample a batch of cluster assignments z, keeping the means, variances and mixing proportions constant, and sample a batch of observed variables x using my saved z, like so:

M = 100
K = 3
D = 2

mean_precision_shape,mean_precision_rate,obs_precision_shape,obs_precision_rate = 4.,200.,6.,10.

sess = tf.Session()
with sess.as_default():
    
    # p model
    alpha = 1
    pi = Dirichlet(np.atleast_1d(alpha*np.ones(K)).astype(np.float32))
    z = Multinomial(total_count=1.,probs=tf.reshape(tf.tile(pi,[M]),[M,K]))
    sigma2_mu_k = ed.models.InverseGamma([[mean_precision_shape]],[[mean_precision_rate]])
    sigma2_mu_d = ed.models.InverseGamma(mean_precision_shape*tf.ones([D]),mean_precision_rate*tf.ones([D]))
    sigma2_mu = tf.tile(sigma2_mu_k, [K,1])*sigma2_mu_d
    mu = ed.models.MultivariateNormalDiag(tf.zeros([K,D]), tf.sqrt(sigma2_mu))
    sigma2_obs_n = ed.models.InverseGamma([[obs_precision_shape]],[[obs_precision_rate]])
    sigma2_obs_d = ed.models.InverseGamma(obs_precision_shape*tf.ones([D]),obs_precision_rate*tf.ones([D]))
    sigma2_obs = tf.tile(sigma2_obs_n, [M,1])*sigma2_obs_d
    x = ed.models.MultivariateNormalDiag(tf.matmul(z, mu), tf.sqrt(sigma2_obs))
    
    init = tf.global_variables_initializer()
    init.run()
    
    # identify global and local latent variables
    latent_variables = x.get_ancestors()
    local_latent_variables = [lv for lv in latent_variables if lv.shape[0] == M]
    local_parent = [lv for lv in local_latent_variables[0].get_ancestors()]
    global_latent_variables = [lv for lv in latent_variables if lv not in local_latent_variables and lv not in local_parent]
        
    # sample global variables
    true_global_latent_variables = [lv.sample().eval() for lv in global_latent_variables]
    # sample M local variables M times
    true_local_latent_variables = ed.copy(z,{local_parent[0]:local_parent[0].eval()}).sample((M,)).eval()

    X_sample = np.zeros((M,M,D))
    latent_variables = dict(zip(global_latent_variables, true_global_latent_variables))
    for i in range(2):
        latent_variables.update({z:true_local_latent_variables[i]})
        X_sample[i] = ed.copy(x,latent_variables).eval() # this works only in the first iteration
        #tf.reshape(,(-1,K))
        plt.figure()
        plt.scatter(*X_sample[i].T,color=true_local_latent_variables[i])
        plt.axis('equal');

The output is

index
index2

It seems that ed.copy accepts the z’s that I’ve updated my dict with only in the first iteration, and in the second it ignores it and samples its own z’s - but still uses the other variables I pass it.

  1. Does it make sense to build something like this?
  2. Is there a better way to do it?
  3. How do I get ed.copy to use my saved z’s in every iteration?

#2

Found the solution here: https://github.com/blei-lab/edward/issues/427 and here: Basics of Graphs / Flow Control
Using .value one can sample from the joint, and save the variables that their children condition on

M = 10000
K = 3
D = 2

mean_precision_shape,mean_precision_rate,obs_precision_shape,obs_precision_rate = 4.,200.,6.,10.

sess = tf.Session()

# p model
alpha = 1
pi = Dirichlet(np.atleast_1d(alpha*np.ones(K)).astype(np.float32))
z = Multinomial(total_count=1.,probs=tf.reshape(tf.tile(pi,[M]),[M,K]))
sigma2_mu_k = ed.models.InverseGamma([[mean_precision_shape]],[[mean_precision_rate]])
sigma2_mu_d = ed.models.InverseGamma(mean_precision_shape*tf.ones([D]),mean_precision_rate*tf.ones([D]))
sigma2_mu = tf.tile(sigma2_mu_k, [K,1])*sigma2_mu_d
mu = ed.models.MultivariateNormalDiag(tf.zeros([K,D]), tf.sqrt(sigma2_mu))
sigma2_obs_n = ed.models.InverseGamma([[obs_precision_shape]],[[obs_precision_rate]])
sigma2_obs_d = ed.models.InverseGamma(obs_precision_shape*tf.ones([D]),obs_precision_rate*tf.ones([D]))
sigma2_obs = tf.tile(sigma2_obs_n, [M,1])*sigma2_obs_d
x = ed.models.MultivariateNormalDiag(tf.matmul(z, mu), tf.sqrt(sigma2_obs))

latent_variables = x.get_ancestors()
model = [x, *latent_variables]
model_sample = dict(zip(model,sess.run([v.value() for v in model])))

and divide into mini-batches after.