Streamed batch training with manual update - scaling considerations


#1

Hi there

related:


My starting point is http://edwardlib.org/tutorials/batch-training in which the tutorial makes a note of setting a scale factor of

N/M

Like the two related posts up the top i’m interested in running VI in a streaming context. Regarding the initialisation of ed.KLqp I have a few confusions

  1. Is the scale parameter only relevant if one uses inference.run at first with N > M training data points.

  2. Is n_samples only relevant if one uses an inference. run. call? I.e. if from the beginning if you have a large amount of data to train first on, and then the model is exposed to completely new data in a streaming situation. If not where does this fit in

  3. In the case of having no data initially, and then acquiring data accumulating into buffers of size M, is it enough to simply have N=M giving a scale of 1 ?

cheers!


#2

To some degree, yes. It’s used generally to scale any computation with respect to the random variables. For example, you might use it for masking.

n_samples is an algorithm hyperparameter in ed.KLqp, representing the number of samples to estimate the gradient of the loss function. It is always relevant whenever running the algorithm.

That would only work if after each streaming batch, you re-set the prior distribution to be the inferred posterior from the batch. Otherwise, imagine you had a billion streaming points, with a batch size of 1; without scaling the likelihood by 1 million, the prior overwhelms the likelihood so the posterior will not differ much from the prior.

The discussion in Iterative estimators ("bayes filters") in Edward? provides most detail.


#3

thanks Dustin, that clears things up