Iterative estimators ("bayes filters") in Edward?

Hello all,

I’m a post-doc researcher working on self-driving vehicles. I’ve recently learned of probabilistic programming languages, and I am looking into how they could be used to solve problems in autonomous driving. I’ve got particularly hooked on Edward and how it enables mixing together deep learning networks and probabilistic techniques.

I plan to give a presentation on Edward to my lab next Thursday. I would like to present a simple demo showing how it could be used to do online estimation of a robot’s position and speed from an input stream of noisy position sensor readings. The problem is that I can’t wrap my mind around how a predict-update loop would be implemented in Edward. I’ve seen a couple examples using tf.scan but I’m not sure that’s what I need.

Could anyone give me a hand?

1 Like

To handle streaming data, the easiest place to get started might be with the Bayesian linear regression example. Instead of calling inference.run(), you can manually write the training loop:

# placeholders on data, where we will pass in values during training
X_ph = tf.placeholder(tf.float32, [None, D])
y_ph = tf.placeholder(tf.float32, [None])

inference = ed.KLqp({w: qw, b: qb}, data={X: X_ph, y: y_ph})
inference.initialize()
tf.global_variables_initializer().run()

for _ in range(inference.n_iter):
  X_batch, y_batch = next_batch()
  info_dict = inference.update({X_ph: X_batch, y_ph: y_batch})
  inference.print_progress(info_dict)

Here, next_batch() produces the next batch of data available to us. For example:

def next_batch():
  M = 10  # batch size
  w = np.ones(D) * 5  # true parameters
  x = np.random.randn(M, D)
  y = np.dot(x, w) + np.random.normal(0, 0.1, size=M)
  return x, y

This performs one update of inference for each new batch of data. You might also run update() multiple times for each new batch.

Note that some mathematical consideration needs to made depending on the choice of inference. Not all enable online estimation. For example with ed.KLqp, you need to properly scale the data’s computation, or otherwise the prior will overwhelm each minibatch’s likelihood. To do this, add the scale argument to inference,

inference.initialize(scale={y: 100.0})

This scales all computation on the minibatch by 100. If we were doing subsampling of a fixed data set of size N and always subsampling a size of M, then the proper scale factor would be N / M. For infinite data, the scale factor is a little more arbitrary and relates to the population posterior (McInerney et al., 2015).

An alternative for the most vanilla Bayes filter is to do recursive posterior inferences: first perform inference on one data set to get a posterior, qw; then build a new model with qw as the prior and perform inference on another data set to get a new posterior, qw2; repeat. One disadvantage is that the graph is very large if you aim to repeat this process many times. But if you know how many times you’ll do this process, then the graph is finite and you won’t need dynamic things like tf.scan.

1 Like

Hi Dustin,

thanks for the feeding example. Quick question: in the following line, what objects would X and y be and what shape would they have?

inference = ed.KLqp({w: qw, b: qb}, data={X: X_ph, y: y_ph})

The assumption is that X and y are random variables with shape compatible with the values they’re binded to. For linear models, I guess you don’t actually need X. More concretely it would be

X = tf.placeholder(tf.float32, [None, D])
y_ph = tf.placeholder(tf.float32, [None])

w = Normal(loc=tf.zeros(D), scale=tf.ones(D))
b = Normal(loc=tf.zeros(1), scale=tf.ones(1))
y = Normal(loc=ed.dot(X, w) + b, scale=1.0)

inference = ed.KLqp({w: qw, b: qb}, data={y: y_ph})

If you know the batch size you can set the Nones in the placeholders at some fixed value.

Then you would feed in

inference.update({X: X_batch, y_ph: y_batch})
1 Like

Cool, thank you! This might be worth to be added to the notebook collection as a general data feed example.

Sorry for highjacking the thread!

re:notebook. That makes sense. Contributions welcome. Note this is actually a special case of the data subsampling guide without local variables. But it’s significantly easier to work with because there’s no local inferences, so it’s worth writing standalone.

(The topic is relevant; no highjacking crime is committed.)

1 Like