How to handle missing values in Gaussian Matrix Factorization


Hi, dustin.

I’m trying to implement Gaussian Matrix Factorization (aka Probabilistic Matrix Factorization, PMF)[1] in Edward.
My question is how to handle missing value as “missing value”.

The PMF uses indicator function to distinguish the rating value y_ij observed or not in Eq (1) of [1], but my current implementation (please see bottom) treats the missing value as zero (not missing), so the predicted rating value highly biased towards zero.

Is there any good practice for it?
Or do I need to implement PMF specific variational method?


N = 10
M = 10
K = 5 # latent dimension

y_train = np.random.randint(low=0, high=5, size=(M, N))

U = Normal(mu=tf.zeros([M, K]), sigma=tf.ones([M, K]))
V = Normal(mu=tf.zeros([N, K]), sigma=tf.ones([N, K]))
Y = Normal(mu=tf.matmul(U, V, transpose_b=True), sigma=tf.ones([M, N]))

qU = Normal(
    mu=tf.Variable(tf.random_uniform([M, K])),
    sigma=tf.Variable(tf.nn.softplus(tf.random_uniform([M, K])))

qV = Normal(
    mu=tf.Variable(tf.random_uniform([N, K])),
    sigma=tf.Variable(tf.nn.softplus(tf.random_uniform([N, K])))

inference = ed.KLqp({U: qU, V: qV}, data={Y: y_train})

[1] Salakhutdinov, R., & Mnih, A. (2007). Probabilistic Matrix Factorization. In Proc of NIPS


In the literature, this is known as implicit feedback. From my (limited) understanding, there are a few ways to handle the zeroes:

  1. Treat the zeros as part of the data, as you mention. For Gaussian MF, you then have to downweight the zeros somehow during inference via large penalizations. Poisson MF naturally solves this by defining a sparse generative process.
  2. Treat the zeroes as missing values (latent variables), and marginalize them out. This can be tough in most cases. To do this in Edward, include a tf.placeholder for the indicators. Here’s an example.
I = tf.placeholder(tf.int32)
mu = tf.matmul(U, V, transpose_b=True)
sigma = tf.ones([M, N])
Y_obs = Normal(mu=tf.gather(mu, I), sigma=tf.gather(sigma, I))
Y_mis = Normal(mu=tf.gather(mu, 1 - I), sigma=tf.gather(sigma, 1 - I))

qY_mis = Normal(

inference = ed.KLqp({U: qU, V: qV, Y_mis: qY_mis}, data={Y_obs: y_train, I: I_train})

Your mileage will vary depending on how you structure the indicators and nonzero values in the matrix.


Thanks for your replying!
Both of the answers are helpful for me :slight_smile: