How to use posterior predictive check in Edward?

edisoncruise · June 29, 2017, 2:07pm

I want to evaluate the result of linear regression with ppc. Below is my code by referring to linear regression tutorial and Criticism API of Edward. My questions are in comments.

from edward.models import Normal
from edward.criticisms import ppc_density_plot
import numpy as np
import edward as ed
import tensorflow as tf
import matplotlib.pyplot as plt

def build_toy_dataset(N, w, noise_std=0.1):
D = len(w)
x = np.random.randn(N, D)
y = np.dot(x, w) + np.random.normal(0, noise_std, size=N)
return x, y

with tf.device(’/cpu:0’):

N = 40  # number of data points
D = 10  # number of features

w_true = np.random.randn(D)
X_train, y_train = build_toy_dataset(N, w_true)
X_test, y_test = build_toy_dataset(N, w_true)

X = tf.placeholder(tf.float32, [N, D])
w = Normal(loc=tf.zeros(D), scale=tf.ones(D))
b = Normal(loc=tf.zeros(1), scale=tf.ones(1))
y = Normal(loc=ed.dot(X, w) + b, scale=tf.ones(N))


qw = Normal(loc=tf.Variable(tf.random_normal([D])),
            scale=tf.nn.softplus(tf.Variable(tf.random_normal([D]))))
qb = Normal(loc=tf.Variable(tf.random_normal([1])),
            scale=tf.nn.softplus(tf.Variable(tf.random_normal([1]))))

inference = ed.KLqp({w: qw, b: qb}, data={X: X_train, y: y_train})
inference.run(n_samples=5, n_iter=250)

y_post = ed.copy(y, {w: qw, b: qb})

##1.What is the meaning of xs[]?
myfun = lambda xs, zs: tf.reduce_mean(xs[y_post])
y_rep=ed.ppc(myfun, data={X: X_train, y_post: y_train},
latent_vars={w: qw, b: qb})
##2. Now AA is an 100*2 array. My understanding is that AA[:,0] is the ##distribution of means of the ##replicated y. But what is AA[:,1]? They seem close to np.mean(y_train)
AA=(np.array(y_rep)).T
##3. What I expect is the histogram of means of the replicated y with a line indicating np.mean(y_train), #as show in attached figure
ppc_density_plot(y_train,AA)
plt.show()

dustin · July 1, 2017, 11:20pm

Q1 and Q2 are answered on ed.ppc's API docs (http://edwardlib.org/api/ed/ppc). For Q3, see the ppc_density_plot API. A toy demo is available here.

According to your inputs, the PPC is simulating a data set from the model, taking the empirical mean, and doing this over many simulations. The second element in the returned tuple is the empirical mean over the observed data; it is the same across all simulations.

edisoncruise · July 3, 2017, 5:12am

Thank you Dustin. The demo really helps.

Topic		Replies	Views
Inference from within the model with prediction result	1	1224	July 31, 2017
Significant difference in PPC between Edward and PyMC3?	0	1414	January 10, 2018
Bayesian regression	4	1196	June 26, 2017
What's the suggested way to generate and examine posterior distributions of parameters?	2	1043	January 8, 2018
Modify Bayesian Linear Model to use Gibbs, but got NotImplementedError	0	799	August 3, 2018

How to use posterior predictive check in Edward?

Related Topics