Saving Model Parameters

moved from Saving Model Parameters · Issue #535 · blei-lab/edward · GitHub.

@GhassanMakhoul

I am working on a variational inference problem, and was wondering how I could snapshot the weights after training. I want to be able to restore then query ‘q’, my approximation distribution, and draw samples from it. I am familiar with saving variables with tf.train.saver. Is there a similarly easy way to save the parameters of my inference model?

@dustinvtran

hi @GhassanMakhoul | tf.train.Saver applies to variational approximations too. In your code, you should be handling your own tf variables to parameterize your inference model. After (or during) inference, you can call, e.g.,

saver = tf.train.Saver()

sess = ed.get_session()
save_path = saver.save(sess, “/tmp/posterior.ckpt”)
print(“Inference model saved in file: %s” % save_path)

This is the [same way you would save model parameters in TensorFlow](https://www.tensorflow.org/versions/master/how_tos/variables/). (All extensions apply, such as saving only a subset of the inference model parameters.)

Hi, I’m trying to add model saving to the example “vae_convolutional.py”, and I’m having some trouble. Here is my code, with the added code marked with #ADDED CODE:

#!/usr/bin/env python
"""Convolutional variational auto-encoder for binarized MNIST.

The neural networks are written with TensorFlow Slim.

References
----------
http://edwardlib.org/tutorials/decoder
http://edwardlib.org/tutorials/inference-networks
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import edward as ed
import numpy as np
import os
import tensorflow as tf

from edward.models import Bernoulli, Normal
from edward.util import Progbar
from scipy.misc import imsave
from tensorflow.contrib import slim
from tensorflow.examples.tutorials.mnist import input_data


def generative_network(z):
  """Generative network to parameterize generative model. It takes
  latent variables as input and outputs the likelihood parameters.

  logits = neural_network(z)
  """
  with slim.arg_scope([slim.conv2d_transpose],
                      activation_fn=tf.nn.elu,
                      normalizer_fn=slim.batch_norm,
                      normalizer_params={'scale': True}):
    net = tf.reshape(z, [M, 1, 1, d])
    net = slim.conv2d_transpose(net, 128, 3, padding='VALID')
    net = slim.conv2d_transpose(net, 64, 5, padding='VALID')
    net = slim.conv2d_transpose(net, 32, 5, stride=2)
    net = slim.conv2d_transpose(net, 1, 5, stride=2, activation_fn=None)
    net = slim.flatten(net)
    return net


def inference_network(x):
  """Inference network to parameterize variational model. It takes
  data as input and outputs the variational parameters.

  loc, scale = neural_network(x)
  """
  with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      activation_fn=tf.nn.elu,
                      normalizer_fn=slim.batch_norm,
                      normalizer_params={'scale': True}):
    net = tf.reshape(x, [M, 28, 28, 1])
    net = slim.conv2d(net, 32, 5, stride=2)
    net = slim.conv2d(net, 64, 5, stride=2)
    net = slim.conv2d(net, 128, 5, padding='VALID')
    net = slim.dropout(net, 0.9)
    net = slim.flatten(net)
    params = slim.fully_connected(net, d * 2, activation_fn=None)

  loc = params[:, :d]
  scale = tf.nn.softplus(params[:, d:])
  return loc, scale


ed.set_seed(42)

M = 128  # batch size during training
d = 10  # latent dimension
DATA_DIR = "data/mnist"
IMG_DIR = "img"

if not os.path.exists(DATA_DIR):
  os.makedirs(DATA_DIR)
if not os.path.exists(IMG_DIR):
  os.makedirs(IMG_DIR)

# DATA. MNIST batches are fed at training time.
mnist = input_data.read_data_sets(DATA_DIR)

# MODEL
z = Normal(loc=tf.zeros([M, d]), scale=tf.ones([M, d]))
logits = generative_network(z)
x = Bernoulli(logits=logits)

# INFERENCE
x_ph = tf.placeholder(tf.int32, [M, 28 * 28])
loc, scale = inference_network(tf.cast(x_ph, tf.float32))
qz = Normal(loc=loc, scale=scale)

# Bind p(x, z) and q(z | x) to the same placeholder for x.
data = {x: x_ph}
inference = ed.KLqp({z: qz}, data)
optimizer = tf.train.AdamOptimizer(0.01, epsilon=1.0)
inference.initialize(optimizer=optimizer)

hidden_rep = tf.sigmoid(logits)

tf.global_variables_initializer().run()

# ADDED CODE
load_saved_model = True
model_path = r"/tmp/model_vae_edward.ckpt"
sess = ed.get_session()
saver = tf.train.Saver()
if load_saved_model:
  saver.restore(sess, model_path)
  print("Model restored.")
# END ADDED CODE

n_epoch = 1
n_iter_per_epoch = 10
for epoch in range(n_epoch):
  avg_loss = 0.0

  pbar = Progbar(n_iter_per_epoch)
  for t in range(1, n_iter_per_epoch + 1):
    pbar.update(t)
    x_train, _ = mnist.train.next_batch(M)
    x_train = np.random.binomial(1, x_train)
    info_dict = inference.update(feed_dict={x_ph: x_train})
    avg_loss += info_dict['loss']

  # Print a lower bound to the average marginal likelihood for an
  # image.
  avg_loss = avg_loss / n_iter_per_epoch
  avg_loss = avg_loss / M
  print("log p(x) >= {:0.3f}".format(avg_loss))

  # Visualize hidden representations.
  imgs = hidden_rep.eval()
  for m in range(M):
    imsave(os.path.join(IMG_DIR, '%d.png') % m, imgs[m].reshape(28, 28))

  # ADDED CODE
  save_path = saver.save(sess, model_path)
  print("Model saved in file: %s" % save_path)
  # END ADDED CODE

The model saves ok, but when I try to load the model (set load_saved_model = True), I get the following error:
NotFoundError (see above for traceback): Key optimizer_274937448/fully_connected/weights/Adam_1 not found in checkpoint

What am I doing wrong?

I am also having the same problem when trying to load a trained model using tf.train.import_meta_graph(). I get: KeyError: “The name ‘Normal’ refers to an Operation not in the graph.” Are we meant to only save the tf.variables and not the operations?

Here is what I’m doing:

    with tf.Session() as sess:
        # Restore variables from disk.
        saver = tf.train.import_meta_graph('./models/posterior.ckpt.meta')
        saver.restore(sess, tf.train.latest_checkpoint('./models/'))

The full error is:

Traceback (most recent call last):
  File "/home/rmason/Github/alpha-i/edward-mock-time-series-test/edward_feedforward_network.py", line 87, in <module>
neural_net.run_testing(TRAINING_SERIES)
  File "/home/rmason/Github/alpha-i/edward-mock-time-series-test/edward_feedforward_network.py", line 71, in run_testing
saver = tf.train.import_meta_graph('./models/posterior.ckpt.meta')
  File "/home/rmason/anaconda3/envs/time-series-env/lib/python3.4/site-packages/tensorflow/python/training/saver.py", line 1686, in import_meta_graph
**kwargs)
  File "/home/rmason/anaconda3/envs/time-series-env/lib/python3.4/site-packages/tensorflow/python/framework/meta_graph.py", line 536, in import_scoped_meta_graph
ops.prepend_name_scope(value, scope_to_prepend_to_names))
  File "/home/rmason/anaconda3/envs/time-series-env/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 2584, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
  File "/home/rmason/anaconda3/envs/time-series-env/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 2644, in _as_graph_element_locked
"graph." % repr(name))
KeyError: "The name 'Normal' refers to an Operation not in the graph."

@paul That looks like a bug. It tries to restore Adam optimizer parameters, which are stored under a unique inference name ( optimizer_274937448). However, this unique name is different the next time the script is run. Issue raised at https://github.com/blei-lab/edward/issues/696.

@rpmason I personally don’t have much experience with saving/restoring the graph itself. This is certainly worth more investigation.

@paul @rpmason Both bugs (restoring optimizer tf.Variables and importing the metagraph) are fixed in https://github.com/blei-lab/edward/pull/697.

@dustin brilliant! Thanks a lot :).

Hi dustin, I have updated the edward version to 1.3.3. But I still meet this bug. Will recent release fix this? Or I need to modify the files manually following this link https://github.com/blei-lab/edward/pull/697/files?

The bug fix is in Edward’s development version and not in 1.3.3. To install that, see http://edwardlib.org/getting-started.

Thank you dustin. I have installed the development version. But in the example below, I still cannot load the posterior qw, though I can load the placeholder ‘X’. The error message: ‘The name ‘qw:0’ refers to a Tensor which does not exist. The operation ‘qw’, does not exist in the graph."’ Is there anything wrong in my code?

def build_toy_dataset(N, w, noise_std=0.1):
  D = len(w)
  x = np.random.randn(N, D)
  y = np.dot(x, w) + np.random.normal(0, noise_std, size=N)
  return x, y

###save all the variable
fname='E:\PythonCode\TestEdward\TestEdward\VariablesFiles_ED\MyVariables'

N = 40  # number of data points
D = 10  # number of features

w_true = np.random.randn(D)
X_train, y_train = build_toy_dataset(N, w_true)
X_test, y_test = build_toy_dataset(N, w_true)

X = tf.placeholder(tf.float32, [N, D],name='X')
w = Normal(loc=tf.zeros(D), scale=tf.ones(D),name='w')
b = Normal(loc=tf.zeros(1), scale=tf.ones(1),name='b')
y = Normal(loc=ed.dot(X, w) + b, scale=tf.ones(N),name='y')


qw = Normal(loc=tf.Variable(tf.random_normal([D])),
            scale=tf.nn.softplus(tf.Variable(tf.random_normal([D]))),name='qw')
qb = Normal(loc=tf.Variable(tf.random_normal([1])),
            scale=tf.nn.softplus(tf.Variable(tf.random_normal([1]))),name='qb')

inference = ed.KLqp({w: qw, b: qb}, data={X: X_train, y: y_train})
inference.run(n_samples=5, n_iter=250)

saver = tf.train.Saver()
sess = ed.get_session()
sess.run(tf.global_variables_initializer())
save_path = saver.save(sess, fname)

#################################run in another part to load variables###################
sess=tf.Session()    

loader = tf.train.import_meta_graph(fname+'.meta')
loader.restore(sess,fname)
graph = tf.get_default_graph()
X_load = graph.get_tensor_by_name("X:0")
print('X:',X_load)
qw_load = graph.get_tensor_by_name("qw:0")
print('qw:',qw_load)

Unfortunately random variables aren’t explicitly stored on TensorFlow’s graph. We decided not to as it would require a new data format, similar to how one stores tf.Tensors and tf.Variables.

This implies you need to import the tensor associated to qw and then re-build qw:

qw_sample = graph.get_tensor_by_name("qw/sample/Reshape:0")
# this wraps the sample to include RV methods
qw = Normal(loc=graph.get_tensor_by_name("qw/loc:0"),
            scale=graph.get_tensor_by_name("qw/scale:0"),
            value=qw_sample)

Maybe there’s an easier approach? Contributions/pull requests welcome.

Thank you dustin. Though not perfect, it can solve my model saving problem.

Hi dustin. Now I can save and restore model parameters, but I meet some difficulty in reconstructing the inference method. In the code below, when I try to run a reconstructed inference, there is an error:
ValueError: cannot add op with name optimizer/Variable_1/Adam as that name is already used

It seems that when I save the model, some inference object are samed, but how can I make a new inference and run it?

import tensorflow as tf
import edward as ed
import numpy as np
from edward.models import Normal

def build_toy_dataset(N, w, noise_std=0.1):
D = len(w)
x = np.random.randn(N, D)
y = np.dot(x, w) + np.random.normal(0, noise_std, size=N)
return x, y

##################code to save a linear regression model###########################

fname=‘E:\PythonCode\TestEdward\TestEdward\InferencesFiles_ED\MyInferences’

w_true =np.array([0.4,0.3,-0.1]) # np.random.randn(D)
N = 40 # number of data points
D = 3 # number of features

X_train, y_train = build_toy_dataset(N, w_true)
X_test, y_test = build_toy_dataset(N, w_true)

X = tf.placeholder(tf.float32, [N, D],name=‘X’)
w = Normal(loc=tf.zeros(D), scale=tf.ones(D),name=‘w’)
b = Normal(loc=tf.zeros(1), scale=tf.ones(1),name=‘b’)
y = Normal(loc=ed.dot(X, w) + b, scale=tf.ones(N),name=‘y’)

qw = Normal(loc=tf.Variable(tf.random_normal([D])),
scale=tf.nn.softplus(tf.Variable(tf.random_normal([D]))),name=‘qw’)
qb = Normal(loc=tf.Variable(tf.random_normal([1])),
scale=tf.nn.softplus(tf.Variable(tf.random_normal([1]))),name=‘qb’)

inference = ed.KLqp({w: qw, b: qb}, data={X: X_train, y: y_train})
inference.run(n_samples=5, n_iter=250)

saver = tf.train.Saver()
sess = ed.get_session()
save_path = saver.save(sess, fname)

####code to reload a model and run a new inference with new data########################

sess=tf.Session()
loader = tf.train.import_meta_graph(fname+’.meta’)
loader.restore(sess,fname)
graph = tf.get_default_graph()
X =graph.get_tensor_by_name(“X:0”)

w = Normal(loc=graph.get_tensor_by_name(“w/loc:0”),
scale=graph.get_tensor_by_name(“w/scale:0”),
value=graph.get_tensor_by_name(“w/sample/Reshape:0”))

b = Normal(loc=graph.get_tensor_by_name(“b/loc:0”),
scale=graph.get_tensor_by_name(“b/scale:0”),
value=graph.get_tensor_by_name(“b/sample/Reshape:0”))

qw = Normal(loc=graph.get_tensor_by_name(“qw/loc:0”),
scale=graph.get_tensor_by_name(“qw/scale:0”),
value=graph.get_tensor_by_name(“qw/sample/Reshape:0”))

qb = Normal(loc=graph.get_tensor_by_name(“qb/loc:0”),
scale=graph.get_tensor_by_name(“qb/scale:0”),
value=graph.get_tensor_by_name(“qb/sample/Reshape:0”))

print(‘qw’,qw.mean().eval(session=sess))

y = Normal(loc=graph.get_tensor_by_name(“y/loc:0”),
scale=graph.get_tensor_by_name(“y/scale:0”),
value=graph.get_tensor_by_name(“y/sample/Reshape:0”))

N = 40 # number of data points
D = 3 # number of features
X_test, y_test = build_toy_dataset(N, w_true)

inference = ed.KLqp({w: qw, b: qb}, data={X: X_test, y: y_test})
inference.run(n_samples=5, n_iter=50) ###The value error occurs here

Because you import the graph, you can’t use inference.run() which also builds the computation to run the graph. There’s more bookkeeping involved as you need to write your own training loop.

Thank you dustin. But I just want to restore the saved model parameters and then restart the inference process with new updated samples. Is there any convenient way to achieve that?

I recommend only restoring tf.Variables and not the metagraph itself. You can do so by calling inference.initialize(); then restore under your parameter saver, then manually run inference in a loop (http://edwardlib.org/api/inference). Basically, you’re replacing the tf.global_variables_initializer().run() line.

Thank you dustin, but how to restore tf.Variables without restoring the metagraph? I have tried the following code but it cannot find “qw/loc:0”. Also is it possible to creating qw_loc without specifying its shape. This will raise a dimension mismatch error in tensorflow.

sess=tf.Session()
qw_loc=tf.Variable([])
qw_scale=tf.Variable([])
qw_value=tf.Variable([])
mylist={“qw/loc:0”:qw_loc,“qw/scale:0”:qw_scale,“qw/sample/Reshape:0”:qw_value};
loader=tf.train.Saver(mylist)
loader.restore(sess,fname)

1 Like