Inf and nan loss in Stochastic Volatility Models

Hi,

Sorry for the wrong place, I didn’t see this forum.
I’m working on a Stochastic Volatility model, with code

import tensorflow as tf
import edward as ed
import numpy as np
from edward.models import Normal, InverseGamma, Uniform

# Model Parameters
T = 50
sigmaTrue = 0.5
gammaTrue = 0.2
phiTrue = 0.8

# Simulate Data
hTrue = [0] * T
yTrue = [0] * T
for t in range(T):
    if (t == 1):
        hTrue[t] = np.random.normal(loc=gammaTrue / (1 - phiTrue), 
                                   scale=sigmaTrue / np.sqrt(1 - phiTrue ** 2))
    else:
        hTrue[t] = np.random.normal(loc=gammaTrue + phiTrue * hTrue[t - 1], 
                                    scale=sigmaTrue)
    yTrue[t] = np.random.normal(loc=0, scale=np.exp(hTrue[t] / 2))

# Prior distributions
gamma = Normal(mu=0.0, sigma=5.0)
phi = Uniform(-1.0, 1.0)
sigma = InverseGamma(alpha=1.0, beta=1.0)
# Create h and y as lists
h = [0] * T
y = [0] * T
for t in range(T):
    if (t == 0):
        h[0] = Normal(mu=0.0, sigma=2.0)
    else:
        h[t] = Normal(mu=gamma + phi * h[t - 1], sigma=sigma)
    y[t] = Normal(mu=0.0, sigma=tf.exp(h[t] / 2))

# Create Approximating Distribution
qGamma = Normal(mu=tf.Variable(tf.random_normal([])),
                sigma=tf.nn.softplus(tf.Variable(tf.random_normal([]))))
qPhi = Normal(mu=tf.Variable(tf.random_normal([])),
              sigma=tf.nn.softplus(tf.Variable(tf.random_normal([]))))
qSigma = InverseGamma(
    alpha=tf.nn.softplus(tf.Variable(tf.random_normal([]))),
    beta=tf.nn.softplus(tf.Variable(tf.random_normal([]))))
qH = [0] * T
for t in range(T):
    qH[t] = Normal(mu=tf.Variable(tf.random_normal([])),
                   sigma=tf.nn.softplus(tf.Variable(tf.random_normal([]))))

# Inference
parameters = {
    gamma: qGamma,
    phi: qPhi,
    sigma: qSigma,
}
states = {h: qH for h, qH in zip(h, qH)}
parameters.update(states)
data = {yt: yTrue for yt, yTrue in zip(y, yTrue)}
inference = ed.KLqp(parameters, data=data)
inference.run()

With the result

Iteration    1 [  0%]: Loss = inf
Iteration  100 [ 10%]: Loss = nan
Iteration  200 [ 20%]: Loss = nan
Iteration  300 [ 30%]: Loss = nan
Iteration  400 [ 40%]: Loss = nan
Iteration  500 [ 50%]: Loss = nan
Iteration  600 [ 60%]: Loss = nan
Iteration  700 [ 70%]: Loss = nan
Iteration  800 [ 80%]: Loss = nan
Iteration  900 [ 90%]: Loss = nan
Iteration 1000 [100%]: Loss = nan

I assume there’s some issue with the setup, and the list for y and h feels very awkward, I was trying to follow the approach in https://github.com/blei-lab/edward/pull/480.
Thanks for any help.

Two tips:

  1. Gammas and inverse Gammas are difficult to work with using the generic ed.KLqp algorithm. In general I recommend not using Gammas unless you use another inference. If you just want a distribution on the positive reals, you can look into LogNormal distributions. Removing the Gammas makes the algorithm significantly faster and more stable.
  2. phi and qphi have different supports. This is why the algorithm returns Inf then NaN: the ed.KLqp algorithm evaluates phi's density at samples from qphi. This is undefined outside [-1, 1].

In your code I fixed sigma and phi, and then commented out qSigma, qphi and the line sigma: qSigma, phi: qphi in the parameters dict. This successfully ran.

Hope that helps.

I’m having a similar problem after replacing the Normal weights in a Bayes NN example with Uniform ones. Any idea what could be causing it?

Here are the relevant weight and bias sections:

weights = []
biases = []
bound = 1

weights.append(Uniform(low=-bound*tf.ones([D, width]), high=bound*tf.ones([D, width])))
biases.append(Uniform(low=-bound*tf.ones(width), high=bound*tf.ones(width)))
for _ in xrange(depth):
	weights.append(Uniform(low=-bound*tf.ones([width, width]), high=bound*tf.ones([width, width])))
	biases.append(Uniform(low=-bound*tf.ones(width), high=bound*tf.ones(width)))
weights.append(Uniform(low=-bound*tf.ones([width, O]), high=bound*tf.ones([width, O])))
biases.append(Uniform(low=-bound*tf.ones(O), high=bound*tf.ones(O)))

x = x_train
y = Normal(loc=neural_network(x, weights, biases, dropout),
           scale=0.1 * tf.ones(N_tests * L_tests))

qweights = []
qbiases = []

qweights.append(Uniform(low=tf.Variable(tf.random_uniform([D, width], minval=-bound, maxval=bound)),
              high=tf.nn.softplus(tf.Variable(tf.random_uniform([D, width], minval=-bound, maxval=bound)))))
qbiases.append(Uniform(low=tf.Variable(tf.random_uniform([width], minval=-bound, maxval=bound)),
              high=tf.nn.softplus(tf.Variable(tf.random_uniform([width], minval=-bound, maxval=bound)))))
for _ in xrange(depth):
	qweights.append(Uniform(low=tf.Variable(tf.random_uniform([width, width], minval=-bound, maxval=bound)),
    	          high=tf.nn.softplus(tf.Variable(tf.random_uniform([width, width], minval=-bound, maxval=bound)))))
	qbiases.append(Uniform(low=tf.Variable(tf.random_uniform([width], minval=-bound, maxval=bound)),
    	          high=tf.nn.softplus(tf.Variable(tf.random_uniform([width], minval=-bound, maxval=bound)))))
qweights.append(Uniform(low=tf.Variable(tf.random_uniform([width, O], minval=-bound, maxval=bound)),
              high=tf.nn.softplus(tf.Variable(tf.random_uniform([width, O], minval=-bound, maxval=bound)))))
qbiases.append(Uniform(low=tf.Variable(tf.random_uniform([O], minval=-bound, maxval=bound)),
              high=tf.nn.softplus(tf.Variable(tf.random_uniform([O], minval=-bound, maxval=bound)))))