How to handle missing values in Gaussian Matrix Factorization

isohyt · April 22, 2017, 9:27am

Hi, dustin.

I’m trying to implement Gaussian Matrix Factorization (aka Probabilistic Matrix Factorization, PMF)[1] in Edward.
My question is how to handle missing value as “missing value”.

The PMF uses indicator function to distinguish the rating value y_ij observed or not in Eq (1) of [1], but my current implementation (please see bottom) treats the missing value as zero (not missing), so the predicted rating value highly biased towards zero.

Is there any good practice for it?
Or do I need to implement PMF specific variational method?

Thanks.

N = 10
M = 10
K = 5 # latent dimension

y_train = np.random.randint(low=0, high=5, size=(M, N))


U = Normal(mu=tf.zeros([M, K]), sigma=tf.ones([M, K]))
V = Normal(mu=tf.zeros([N, K]), sigma=tf.ones([N, K]))
Y = Normal(mu=tf.matmul(U, V, transpose_b=True), sigma=tf.ones([M, N]))

qU = Normal(
    mu=tf.Variable(tf.random_uniform([M, K])),
    sigma=tf.Variable(tf.nn.softplus(tf.random_uniform([M, K])))
)

qV = Normal(
    mu=tf.Variable(tf.random_uniform([N, K])),
    sigma=tf.Variable(tf.nn.softplus(tf.random_uniform([N, K])))
)

inference = ed.KLqp({U: qU, V: qV}, data={Y: y_train})
inference.run(n_iter=1000)

[1] Salakhutdinov, R., & Mnih, A. (2007). Probabilistic Matrix Factorization. In Proc of NIPS

dustin · April 22, 2017, 4:25pm

In the literature, this is known as implicit feedback. From my (limited) understanding, there are a few ways to handle the zeroes:

Treat the zeros as part of the data, as you mention. For Gaussian MF, you then have to downweight the zeros somehow during inference via large penalizations. Poisson MF naturally solves this by defining a sparse generative process.
Treat the zeroes as missing values (latent variables), and marginalize them out. This can be tough in most cases. To do this in Edward, include a tf.placeholder for the indicators. Here’s an example.

I = tf.placeholder(tf.int32)
mu = tf.matmul(U, V, transpose_b=True)
sigma = tf.ones([M, N])
Y_obs = Normal(mu=tf.gather(mu, I), sigma=tf.gather(sigma, I))
Y_mis = Normal(mu=tf.gather(mu, 1 - I), sigma=tf.gather(sigma, 1 - I))

qY_mis = Normal(
    mu=tf.Variable(tf.random_uniform(Y_mis.shape)),
    sigma=tf.Variable(tf.nn.softplus(tf.random_uniform(Y_mis.shape)))
)

inference = ed.KLqp({U: qU, V: qV, Y_mis: qY_mis}, data={Y_obs: y_train, I: I_train})

Your mileage will vary depending on how you structure the indicators and nonzero values in the matrix.

isohyt · April 24, 2017, 3:00am

Thanks for your replying!
Both of the answers are helpful for me

Topic		Replies	Views
Nonnegative Matrix Factorization	2	1992	September 5, 2017
Matrix factorization with Masking	1	853	November 3, 2017
Handling missing values	5	1530	April 12, 2018
Matrix factorization - recovering latent factors	1	876	January 27, 2018
How to use Gibbs samples for inference in Probabilistic Matrix Model?	1	1394	May 29, 2017

How to handle missing values in Gaussian Matrix Factorization

Related topics