Iterative estimators ("bayes filters") in Edward?

To handle streaming data, the easiest place to get started might be with the Bayesian linear regression example. Instead of calling inference.run(), you can manually write the training loop:

# placeholders on data, where we will pass in values during training
X_ph = tf.placeholder(tf.float32, [None, D])
y_ph = tf.placeholder(tf.float32, [None])

inference = ed.KLqp({w: qw, b: qb}, data={X: X_ph, y: y_ph})
inference.initialize()
tf.global_variables_initializer().run()

for _ in range(inference.n_iter):
  X_batch, y_batch = next_batch()
  info_dict = inference.update({X_ph: X_batch, y_ph: y_batch})
  inference.print_progress(info_dict)

Here, next_batch() produces the next batch of data available to us. For example:

def next_batch():
  M = 10  # batch size
  w = np.ones(D) * 5  # true parameters
  x = np.random.randn(M, D)
  y = np.dot(x, w) + np.random.normal(0, 0.1, size=M)
  return x, y

This performs one update of inference for each new batch of data. You might also run update() multiple times for each new batch.

Note that some mathematical consideration needs to made depending on the choice of inference. Not all enable online estimation. For example with ed.KLqp, you need to properly scale the data’s computation, or otherwise the prior will overwhelm each minibatch’s likelihood. To do this, add the scale argument to inference,

inference.initialize(scale={y: 100.0})

This scales all computation on the minibatch by 100. If we were doing subsampling of a fixed data set of size N and always subsampling a size of M, then the proper scale factor would be N / M. For infinite data, the scale factor is a little more arbitrary and relates to the population posterior (McInerney et al., 2015).

An alternative for the most vanilla Bayes filter is to do recursive posterior inferences: first perform inference on one data set to get a posterior, qw; then build a new model with qw as the prior and perform inference on another data set to get a new posterior, qw2; repeat. One disadvantage is that the graph is very large if you aim to repeat this process many times. But if you know how many times you’ll do this process, then the graph is finite and you won’t need dynamic things like tf.scan.

1 Like