My starting point is Edward – Batch Training in which the tutorial makes a note of setting a scale factor of

N/M

Like the two related posts up the top i’m interested in running VI in a streaming context. Regarding the initialisation of ed.KLqp I have a few confusions

Is the scale parameter only relevant if one uses inference.run at first with N > M training data points.

Is n_samples only relevant if one uses an inference. run. call? I.e. if from the beginning if you have a large amount of data to train first on, and then the model is exposed to completely new data in a streaming situation. If not where does this fit in

In the case of having no data initially, and then acquiring data accumulating into buffers of size M, is it enough to simply have N=M giving a scale of 1 ?

To some degree, yes. It’s used generally to scale any computation with respect to the random variables. For example, you might use it for masking.

n_samples is an algorithm hyperparameter in ed.KLqp, representing the number of samples to estimate the gradient of the loss function. It is always relevant whenever running the algorithm.

That would only work if after each streaming batch, you re-set the prior distribution to be the inferred posterior from the batch. Otherwise, imagine you had a billion streaming points, with a batch size of 1; without scaling the likelihood by 1 million, the prior overwhelms the likelihood so the posterior will not differ much from the prior.