Hey, thanks for the great library, looking forward to building stuff in it.
I’m just trying to get my head around things, as I’m coming from stan (but I’m familiar with tensorflow also), and I’m a little confused about the linear regression example, where you assume the scales of the priors and the likelihoods are known. This seems fine for the priors, but confusing for the scale of the model y
X = tf.placeholder(tf.float32, [N, D])
w = Normal(loc=tf.zeros(D), scale=tf.ones(D))
b = Normal(loc=tf.zeros(1), scale=tf.ones(1))
y = Normal(loc=ed.dot(X, w) + b, scale=tf.ones(N))
In an equivalent stan example, I would specify scale as its own parameter with its own prior and posterior families, which represents the uncertainty in the data. But here it is set to a fixed value. Is there something edward does behind the scenes that allows the scale param of y to vary, even though it was initialised as a tf.Tensor and not a tf.Variable?
I think I understand that the scale of w and b can change since their posteriors qb and qw and defined with a tf.Variable for scale and location, but I can’t see where the same is done for y.
Thanks for that, I was just trying to implement learning the scale param and ran into similar problems with inverse gamma.
I wonder though, in bayesian linear regression, one typically allows for uncertainty in both beta and sigma, so new comers may be confused by an example that assumes sigma is fixed. Especially because, in the example, the data are generated with a std of 0.1 and then a scale parameter of 1.0 is assumed, which I thought was odd.
Would the team be open to pull requests that added parameter uncertainty to sigma? Seems like the standard way to do things, since it’s a little odd to put so much effort into parameter uncertainty, only to fix the uncertainty of the DGP.
Sure I think that would be welcome. As @MushroomHunting hints, the tutorial would likely work best with LogNormal scale priors; or alternatively, use something like inverse gamma but change the algorithm to do ed.HMC instead.
Would probably be good to keep things using VI for consistency with the current tutorial.
Reading up on how pymc3 deals with bounded variables. Behind the scenes it maps to -inf, inf and just transforms that sample to the 0, inf interval (I’d guess using softplus, they mention using log odds for bounded variables).
Would the correct approach for Edward (and so the one that should be in the tutorials) be to specify the latent “scale” over -inf, inf, use a diffuse normal prior, then just use NormalWithSoftplusScale to transform it to a scale parameter?
PyMC3 and Stan differ from Edward in that they (1) transform all constrained continuous variables to be on the unconstrained space; (2) perform inference on the unconstrained scale; (3) transform back after convergence. Edward currently does everything on the original scale: if the prior has positive support (e.g., Gamma), then the approximating family should too (e.g., log normal).
I recommend the Normal random variable. What’s more vanilla for bayesian linear regression is to place a prior over the scale directly. For example, use a Inverse Gamma prior over the (squared) scale parameter and a log Normal variational approximation; or alternatively, a log Normal prior over the scale.
Okay, good to know. That’s also clarified a misunderstanding I’d had earlier. I’d thought KLqp had difficulty with both inverse gamma priors and variational models, but it seems it’s just the latter.
Also good to know that specifying variational models with the correct support prevents the need to transform the RVs. One gripe I have with PyMC is that their quickstart example is basically entirely about their variational inference implementation failing on a gaussian mixture. That’s good to know, but it is something which the user is powerless to fix, since PyMC ADVI is constrained to use a gaussian variational model!