Nonnegative Matrix Factorization

I am trying to use Edward for non-negative matrix factorization (NMF). I’ve written two toy examples, one of which has no constraints and works. However, the NMF yields all nan values for the decomposed matrices.

The first method is mostly borrowed from one of your examples on github

import edward as ed
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

from edward.models import Normal, Exponential, Gamma
from numpy.random import normal, exponential

def build_toy_dataset(U, V, N, M, noise_std=0.1):
  R = np.dot(np.transpose(U), V) + normal(0, noise_std, size=(N, M))
  return R

N = 50  # number of users
M = 60  # number of movies
D = 3  # number of latent factors

# true latent factors
U_true = normal(size=(D, N))
V_true = normal(size=(D, M))

# DATA
R_true = build_toy_dataset(U_true, V_true, N, M)

# MODEL
U = Normal(loc=tf.zeros([D, N]), scale=tf.ones([D, N]))
V = Normal(loc=tf.zeros([D, M]), scale=tf.ones([D, M]))
R = Normal(loc=tf.matmul(tf.transpose(U), V), scale=tf.ones([N, M]))

# INFERENCE
qU = Normal(loc=tf.Variable(tf.random_normal([D, N])),
            scale=tf.nn.softplus(tf.Variable(tf.random_normal([D, N]))))
qV = Normal(loc=tf.Variable(tf.random_normal([D, M])),
            scale=tf.nn.softplus(tf.Variable(tf.random_normal([D, M]))))

optimizer = tf.train.AdagradOptimizer(learning_rate=0.01)
inference.initialize(optimizer=optimizer)
inference = ed.KLqp({U: qU, V: qV}, data={R: R_true})
inference.run(n_iter=1000)

The model is fit to the data with no problems. However, if I simulate U and V from exponentials, use exponential priors, and exponential variational distributions, the decomposed matrices consist of nans.

def build_toy_dataset(U, V, N, M, noise_std=0.1):
  R = np.dot(np.transpose(U), V) + normal(0, noise_std, size=(N, M))
  return R

N = 50  # number of users
M = 60  # number of movies
D = 3  # number of latent factors

# true latent factors
U_true = exponential(size=(D, N))
V_true = exponential(size=(D, M))

# DATA
R_true = build_toy_dataset(U_true, V_true, N, M)

# MODEL
U = Exponential(rate=tf.ones([D, N]))
V = Exponential(rate=tf.ones([D, M]))
R = Normal(loc=tf.matmul(tf.transpose(U), V), scale=tf.ones([N, M]))

# INFERENCE
qU = Exponential(rate=tf.nn.softplus(tf.Variable(tf.random_normal([D, N]))))
qV = Exponential(rate=tf.nn.softplus(tf.Variable(tf.random_normal([D, M]))))

optimizer = tf.train.AdagradOptimizer(learning_rate=0.01)
inference.initialize(optimizer=optimizer)
inference = ed.KLqp({U: qU, V: qV}, data={R: R_true})
inference.run(n_iter=1000)

Is there something wrong with the inference method being chosen by KLqp? Or is there a problem with using exponentials as variational densities? (certainly something like a gamma would be better, but I wanted to start simple)

Thanks!
Jacob

Hey @jacobcvt12,

This is post is relevant to you …

maybe you could also try using a lognormal variational distributions for U and V?

1 Like

+1 to @jhmarcus’ answer. For lognormal variational approximations against Gamma priors, I also recommend looking at examples/deep_exponential_family.py.

2 Likes