Variational Inference with Composition of Variables

Hi everyone!
I’m trying to estimate the demand rate of a product with inventory stock-out. I started with a simulation of the data and then tried to estimate the latent parameter lambda using variational inference.
The estimated posterior q_lambda however doesn’t seem to capture the lambda value of 1.5

I’ve gotten variational inference to work for the case where inventory is always available (i.e. simply estimating the lambda of Poisson distributed data) but I’m wondering why it breaks down when using a composition of variables.

Code I used is shown below. Thanks!

import tensorflow as tf
from edward.models import Normal,HalfNormal, Beta, Poisson, Uniform
import edward as ed
import numpy as np
import matplotlib.pyplot as plt

# DATA SIMULATION
# I_train simulates inventory availability (1 if product is available)
I_train = np.array([np.random.choice((1,0), p=(.8,.2)) for i in range(3000)]).reshape(-1,1)
# T is the number of time intervals
T = I_train.shape[0]
# S_train simulates number of purchases of a product per time interval 
S_train = np.multiply(np.random.poisson(lam=1.5, size=(T,1)),I_train)

#MODEL
# this is the latent variable we wish to estimate
var_lambda = Uniform(tf.scalar_mul(0,tf.ones([1,1])),tf.scalar_mul(10,tf.ones([1,1])))

I = tf.placeholder(tf.float32,[T,1])
S = tf.multiply(Poisson(tf.tile(tf.transpose(var_lambda),[T,1])),I)

#INFERENCE
# using variational inference

q_lambda_scale = tf.nn.softplus(tf.Variable(tf.random_normal([1,1])))
q_lambda_loc = tf.Variable(tf.random_normal([1,1]))
q_lambda = Normal(loc = q_lambda_loc, scale = q_lambda_scale)

inference = ed.KLqp({var_lambda:q_lambda}, data={I: I_train, S: S_train})
inference.run(n_samples=1, n_iter=5000)

print(q_lambda.mean().eval())

x_range = tf.range(-10,30,.1)
	
sess = ed.get_session()
plt.plot(*sess.run([x_range, tf.transpose(var_lambda.prob(x_range))]), color="green")
plt.plot(*sess.run([x_range, tf.transpose(q_lambda.prob(x_range))]), color="blue")
plt.axvline(x = 1.5, color = "red")
plt.show()

The normal variational approximation cannot infer against a Uniform prior on [0, 10]. Both have to be on the same support. (Note also Edward’s automated transformations doesn’t work for parameter-defined supports like a Uniform.)

Maybe try a non-negative continuous prior and variational approximation like LogNormal. See, e.g., examples/deep_exponential_family.py as an example.

1 Like

Thank you @dustin for the suggestion. I’ll be trying it out

I’ve also noticed that the inference works depending on how the inventory availability variable I (an indicator variable) is used.

  1. Instead of multiplying I with the Poisson random variable

    S = tf.multiply(Poisson(tf.tile(tf.transpose(var_lambda),[T,1])),I)

    It’s instead multiplied to the Poisson’s rate

    S = Poisson(tf.multiply(tf.tile(tf.transpose(var_lambda),[T,1]),I))

  2. Since I is part of the Poisson parameter, and Poisson rate can’t be zero, some further adjustments were done. Instead of having 0’s and 1’s as values of the inventory availability data I_train

    I_train = np.array([np.random.choice((1,0), p=(.8,.2)) for i in range(3000)]).reshape(-1,1)

    The zeros are replaced with a small epsilon value

    epsilon = 1e-18
    I_train = np.array([np.random.choice((1,epsilon), p=(.8,.2)) for i in range(3000)]).reshape(-1,1)

Not sure why the variational inference works with this, but it worked

The model you’re trying to fit looks like

y | z = 1 ~ Poisson(lambda)
y | z = 0 = 0

Written like this, you would need to implement S as a mixture of a Poisson and a PointMass.

If instead you write:

y ~ Poisson(lambda)
lambda | z = 1 ~ g
lambda | z = 0 = epsilon

For some prior g, then your implementation matches the model.

But the zero observations can’t tell you anything about lambda, and you’re assuming the latent indicator is observed.

So it seems you should just model the non-zero part of the data and ignore the zeros.

1 Like

Thank you @aksarkar! I’m still just learning about bayesian networks and this is a great help.

I have a follow up question. When you mentioned I would need to implement S as a mixture of a Poisson and a PointMass, does that mean to express S as a product of a Poisson and a PointMass? (i.e. the line below)

S = tf.multiply(Poisson(tf.tile(tf.transpose(var_lambda),[T,1])),PointMass(params = I))

When I tried to use variational inference with this however, it didn’t work any better.

Did you suggest to just model the non-zero part of the data because implementing a mixture of Poisson and PointMass for inference isn’t feasible?

AFAIK you would have to implement a custom RandomVariable for a mixture of a PointMass and Poisson.

I would suggest modeling the non-zero part of the data because you don’t need to learn anything about the zeros in the data. You’ve observed the inventory, so you know whether 0 purchases are explained by the item not being in stock versus not being in demand.

If insead you wanted to learn about the inventory given the observed rate of sales, then this model isn’t powerful enough because over a single time interval you can’t distinguish the two cases.