Variational EM for Independent Factor Analysis


#1

Hi all,
I am trying to implement variational inference for linear combinations of mixtures of Gaussians in the flavor of

Attias, H. “Independent Factor Analysis.” Neural Computation 11.4 (1999): 803-851.

In the following code I am trying to implement a variational EM algorithm with an E step on the categorical latent variables and an M step on the remaining parameters (linear mixing matrix coefficients, mixture parameters, additive noise parameters) using deterministic posteriors.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import edward as ed
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
import six
import tensorflow as tf


from edward.models import Categorical, InverseGamma, Mixture, \
    MultivariateNormalDiag, Normal
from edward.models import PointMass
from edward.models import Dirichlet
from tensorflow.contrib import slim

plt.style.use('ggplot')


def build_toy_dataset(N):
  pi = [[.4,.6],[0.4, 0.3,.3]]
  mus = [[1., -1.], [-1., 1.,0.]]
  stds = [[1., 1.], [1., 1.,2.]]
  stds = [[x*.01 for x in stds[k]] for k in range(2)]
  M= np.array([[1,2],[2, 1]])
  x = np.zeros((N, 2), dtype=np.float32)
  for n in range(N):
      for kcomp in range(2):
          k = np.argmax(np.random.multinomial(1, pi[kcomp]))
          x[n, kcomp] = np.random.normal(mus[kcomp][k], stds[kcomp][k])
  y= np.matmul(x,M.transpose())+ .5*np.random.randn(N,2)
  return x, y


N = 500  # number of data points
D = 2  # dimensionality of data
numMod = [2,3]

# DATA
x_train, y_train = build_toy_dataset(N)

plt.hist(x_train[:,1])
plt.show()
plt.hist(x_train[:,0])
plt.show()

# Build model
mu = [(Normal(mu=tf.zeros([numMod[kcomp], 1]), sigma=tf.ones([numMod[kcomp], 1])))
    for kcomp in range(D)]
sigma = [(InverseGamma(alpha=tf.ones([numMod[kcomp], 1]), beta=tf.ones([numMod[kcomp], 1])))
    for kcomp in range(D)]
betaPar = [(Dirichlet(alpha=tf.ones([1,numMod[kcomp]])))
    for kcomp in range(D)]
catVar = [Categorical(p=tf.ones([N, 1])*betaPar[kcomp])
    for kcomp in range(D)]
x = [Normal(mu=tf.gather(mu[kcomp],catVar[kcomp]),
                           sigma=tf.gather(sigma[kcomp],catVar[kcomp]))
            for kcomp in range(D)]
catx = tf.concat(x,1)
alpha = Normal(mu=tf.zeros([1]),sigma=tf.ones([1]))
beta = Normal(mu=tf.zeros([1]),sigma=tf.ones([1]))    
M = tf.stack((tf.concat((tf.ones([1]), beta),axis=0), tf.concat((alpha,tf.ones([1])),axis=0)),axis=1)
signoise = InverseGamma(alpha=tf.ones([1,2]), beta=tf.ones([1,2]))
y = Normal(mu=tf.matmul(catx,M,transpose_b=True),sigma=tf.ones([N,1])*signoise)

# Define posterior approximations
qz0 = Categorical(logits=tf.Variable(tf.zeros([N, 1])))
qz1 = Categorical(logits=tf.Variable(tf.zeros([N, 1])))
qbeta1 = PointMass(params=tf.nn.softplus(tf.Variable(tf.ones([1,numMod[1]]))))
qbeta0 = PointMass(params=tf.nn.softplus(tf.Variable(tf.ones([1,numMod[0]]))))
qalpha = PointMass(params=tf.Variable(.5*tf.ones([1])))
qbeta = PointMass(params=tf.Variable(.5*tf.ones([1])))
qsignoise = PointMass(params=tf.nn.softplus(tf.Variable(tf.ones([1, 2]))))
qmu = [PointMass(params=tf.Variable(tf.zeros([numMod[kcomp], 1])))
    for kcomp in range(D)]
qsigma = [PointMass(params=tf.nn.softplus(tf.Variable(tf.ones([numMod[kcomp], 1]))))
    for kcomp in range(D)]

# Define Inference
inference_e = ed.KLqp({catVar[0]: qz0, catVar[1]: qz1}, data={y: y_train, betaPar[1]: qbeta1, betaPar[0]: qbeta0, alpha: qalpha, beta: qbeta, signoise: qsignoise, mu[0]: qmu[0], sigma[0]: qsigma[0], mu[1]: qmu[1], sigma[1]: qsigma[1]})
inference_m = ed.MAP({ betaPar[1]: qbeta1, betaPar[0]: qbeta0, alpha: qalpha, beta: qbeta, signoise: qsignoise, mu[0]: qmu[0], sigma[0]: qsigma[0], mu[1]: qmu[1], sigma[1]: qsigma[1]}, data={y: y_train, catVar[0]: qz0, catVar[1]: qz1})


init = tf.global_variables_initializer()
inference_e.initialize()
inference_m.initialize()

sess = ed.get_session()

init.run()
for _ in range(10):
    inference_e.update()
    inference_m.update()

The inferences initialize without error, but then the both inference update return an error message I have difficulties to interpret (I don’t know which variable in my code is involved in the first place).

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value Variable_187
	 [[Node: Variable_187/read = Identity[T=DT_INT32, _class=["loc:@Variable_187"], _device="/job:localhost/replica:0/task:0/cpu:0"](Variable_187)]]

Any indication how to interpret or solve this would be much appreciated.

Thanks,
Michel


#2

In your code, you wrote

init = tf.global_variables_initializer()
inference_e.initialize()
inference_m.initialize()
...
init.run()

The first line defines an op to initialize all TensorFlow variables in the graph from that point in time. However, inference_e.initialize() and initialize_m.initialize() also internally define TensorFlow variables. This means you should call tf.global_variables_initializer() after inference initializations. More generally, it should be called after building all computation in TensorFlow:

inference_e.initialize()
inference_m.initialize()
...
init = tf.global_variables_initializer()
init.run()

Two more tips:

  1. You should vectorize your model rather than carry around large lists. This won’t make much a difference if D is small but it will if you aim to scale it up.
  2. Inference over discrete variables such as Categorical can be difficult, especially using generic algorithms such as ed.KLqp. Your mileage may vary depending on good the E-step is. Due to the stochasticity of the E-step, you can consider running inference_e.update() many times before calling inference_m.update().

To properly do Attias (1999) I think you would have to code up the coordinate-wise updates as in the paper.


#4

Many thanks for the quick and very useful feedback Dustin! The inferences now update without throwing any error. I will try to vectorize everything.

Regarding your comment

To properly do Attias (1999) I think you would have to code up the coordinate-wise updates as in the paper.

Yes, for now I do not want to replicate the exact same procedure as in Attias (1999), just try a few generic inference methods on the same model and see how they perform. I keep posted if it gets interesting.


#5

Hi Dustin,
Here is an updated version of the code. In order to do variational inference in the flavor of Attias (1999), I approximated the posterior using Mixture of Gaussians distributions with parameters that are dependent on the observed data. Inference runs without error, whether it performs well is still to be evaluated…

I have one small question: you advised to vectorize the code as much as possible. I did for the Mixture parameters, but I am not sure I can do more than that. Also, in order to be able to define the inference irrespective of the dimension of the model, I would like to replace the series of terms “x[0]: qx[0]…” by “catx: qcatx”, but the inference does not accept it (because it is a Tensor). Do I have to define myself a new distribution, that would be the concatenation of several mixtures?
Here is the code:

#!/usr/bin/env python
""" Independent Factor Analysis using Mixture of Gaussians.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import edward as ed
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
import six
import tensorflow as tf


from edward.models import Categorical, InverseGamma, Mixture, \
    MultivariateNormalDiag, Normal
from edward.models import PointMass
from edward.models import Dirichlet
from tensorflow.contrib import slim

plt.style.use('ggplot')


def build_toy_dataset(N):
  pi = [[.4,.6],[0.4, 0.3,.3]]
  mus = [[1., -1.], [-1., 1.,0.]]
  stds = [[1., 1.], [1., 1.,2.]]
  stds = [[x*.01 for x in stds[k]] for k in range(2)]
  M= np.array([[1,2],[2, 1]])
  x = np.zeros((N, 2), dtype=np.float32)
  for n in range(N):
      for kcomp in range(2):
          k = np.argmax(np.random.multinomial(1, pi[kcomp]))
          x[n, kcomp] = np.random.normal(mus[kcomp][k], stds[kcomp][k])
  y= np.matmul(x,M.transpose())+ .5*np.random.randn(N,2)
  return x, y


N = 2000  # number of data points
D = 2  # dimensionality of data
numMod = [2,3]

# DATA
x_train, y_train = build_toy_dataset(N)

plt.hist(x_train[:,1])
plt.show()
plt.hist(x_train[:,0])
plt.show()


# Define start index of each mixture for the vectorized parameters (keep integer type)
npNumMod=np.array(numMod)
startIdx = npNumMod.cumsum()
startIdx = np.concatenate((np.zeros([1],dtype=np.int32),startIdx[:-1]))

# Build model based on vectorized parameters    
muCat = tf.Variable(tf.random_normal([npNumMod.sum(),1]))
sigmaCat = tf.Variable(tf.nn.softplus(tf.random_normal([npNumMod.sum(),1])))

# Define categorical latent variables
betaPar = [(tf.Variable(tf.zeros([1,numMod[kcomp]])))
    for kcomp in range(D)]
catVar = [Categorical(logits=tf.ones([N, 1])*betaPar[kcomp])
    for kcomp in range(D)]

# Define Mixture based on vecotrized parameters
x = [Normal(mu=tf.gather(muCat,startIdx[kcomp]+catVar[kcomp]),sigma=tf.gather(sigmaCat,startIdx[kcomp]+catVar[kcomp]))
            for kcomp in range(D)]
catx = tf.concat(x,1)
alpha = Normal(mu=tf.zeros([1]),sigma=tf.ones([1]))
beta = Normal(mu=tf.zeros([1]),sigma=tf.ones([1]))    
M = tf.stack((tf.concat((tf.ones([1]), beta),axis=0), tf.concat((alpha,tf.ones([1])),axis=0)),axis=1)
signoise = InverseGamma(alpha=tf.ones([1,2]), beta=.1*tf.ones([1,2]))
y = Normal(mu=tf.matmul(catx,M,transpose_b=True),sigma=tf.ones([N,1])*signoise)

# Define posterior approximations
qmuCat = PointMass(tf.random_normal([npNumMod.sum(),1]))
qsigmaCat = PointMass(tf.nn.softplus(tf.random_normal([npNumMod.sum(),1])))

# Define categorical latent variable posterior distributions
qbetaPar = [tf.Variable(tf.ones([N,numMod[kcomp]])) for kcomp in range(D)]
qcatVar = [Categorical(logits=qbetaPar[kcomp])
    for kcomp in range(D)]

# Define posterior Mixture based on vecotrized parameters
qx = [Normal(mu=tf.gather(qmuCat,startIdx[kcomp]+catVar[kcomp]),sigma=tf.gather(qsigmaCat,startIdx[kcomp]+catVar[kcomp]))
            for kcomp in range(D)]
qcatx = tf.concat(qx,1)
# Define point mass distribution for MAP parameter estimation
qalpha = PointMass(params=tf.Variable(.5*tf.ones([1])))
qbeta = PointMass(params=tf.Variable(.5*tf.ones([1])))
qsignoise = PointMass(params=tf.nn.softplus(tf.Variable(.1*tf.ones([1, 2]))))
    
# Define variational Inference
inference_e = ed.KLqp({x[0]: qx[0], x[1]: qx[1]}, data={y: y_train, alpha: qalpha, beta: qbeta, signoise: qsignoise})
inference_m = ed.MAP({  alpha: qalpha, beta: qbeta, signoise: qsignoise}, data={y: y_train, x[0]: qx[0], x[1]: qx[1]})

inference_e.initialize()
inference_m.initialize()

sess = ed.get_session()

init = tf.global_variables_initializer()
init.run()
for k in range(1000):
    for k in range(10):
        inference_e.update()
    inference_m.update()

Thanks,
Michel


#6

in order to be able to define the inference irrespective of the dimension of the model, I would like to replace the series of terms “x[0]: qx[0]…” by “catx: qcatx”, but the inference does not accept it (because it is a Tensor). Do I have to define myself a new distribution, that would be the concatenation of several mixtures?

Good question. To vectorize, you need to vectorize the parameters that go into the distribution. Concretely, you currently have

x = [Normal(mu=tf.gather(muCat,startIdx[kcomp]+catVar[kcomp]),sigma=tf.gather(sigmaCat,startIdx[kcomp]+catVar[kcomp]))
            for kcomp in range(D)]

The simplest vectorization would stack the individual parameters

x = Normal(
  mu=[tf.gather(muCat, startIdx[kcomp] + catVar[kcomp]) for kcomp in range(D)],
  sigma=[tf.gather(sigmaCat, startIdx[kcomp] + catVar[kcomp]) for kcomp in range(D)])

This has little computational benefit as there’s still a loop, but it demonstrates the idea. The next step would be to vectorize it further with something like

x = Normal(
  mu=tf.gather(muCat, tf.gather(startIdx, tf.range(D)) + tf.gather(catVar, tf.range(D))),
  sigma=tf.gather(sigmaCat, tf.gather(startIdx, tf.range(D) + tf.gather(catVar, tf.range(D))))

Also, I would double check the model specification; is it correct? For a mixture model, you would parameterize your random variable by indexing according to the mixture assignments. I’m not familiar with IFA to know if this is the correct generative process unfortunately.


#7

Many thanks Dustin, I’ll try that.
May I ask you a clarification:

For a mixture model, you would parametrize your random variable by indexing according to the mixture assignments.

I think this is what I tried to do bellow (catVar is supposed to be the mixture assignment, startIdx is an offset due to vectorization).

catVar = [Categorical(logits=tf.ones([N, 1])*betaPar[kcomp])
    for kcomp in range(D)]
x = [Normal(mu=tf.gather(muCat,startIdx[kcomp]+catVar[kcomp]),sigma=tf.gather(sigmaCat,startIdx[kcomp]+catVar[kcomp]))
            for kcomp in range(D)]

Or does your comment refer to the variational approximation of the posterior?

Thanks,
Michel


#8

Ah, got it. That makes sense.


#9

Thanks Dustin. Not sure it is the more compact/efficient way, but for the sake of completeness I just paste here the vectorized expression that runs without errors (fixing dimension issues etc…).

catx = Normal(
  mu=tf.reshape(tf.gather(muCat, tf.ones([N,1],dtype=np.int32)*tf.expand_dims( tf.gather(startIdx, tf.range(D)),0) + tf.stack(catVar,axis=1)),[N,D]),
  sigma=tf.reshape(tf.gather(sigmaCat, tf.ones([N,1],dtype=np.int32)*tf.expand_dims( tf.gather(startIdx, tf.range(D)),0) + tf.stack(catVar,axis=1)),[N,D]))