How to handle the local optimum issue of inference method?


I have tried the KLqp and SGHMC methods for inference. They work well in the linear regression problems as in the tutorial. However, when I feed with practical noisy data, the result of both method becomes very sensitive to the random seed. They are trapped in the local optimum.


How about I take, for example, 100 random seed, and then select the model with best out-of-sample prediction? Is this a practicl way to solve this problem?


In this case, the model might not be the best to explain the data. However, if you still want to find the best local optimal, you can use cross validation or similar predictive accuracy metrics.
In probablistic model, you can use Waic or loo-cv metrics.
If you have samples and loglikelihood conditioned on mcmc samples, the python code will be provided by the authors. You can see BDA textbook from prof. Gelman too.


Thank you all, but I hope to ask a further question.

I am using Bayesian linear regression model y= beta*x +alpha with inference by Markov Chain Monte Carlo (MCMC). I use different random seed to start MCMC, I found that the mean of regression coefficients vector beta are close, relative error about 3%. However, if I want to calculate the probability P{y_new<0|x_new, beta, alpha} with out-of-sample data, the results can be 0.39, 0.49, 0.50, 0.55, etc varies with random seeds 0,1,2,3. The regression coefficients seem not sensitive to the random seed, but my prediction result is.

So is it appropriate to add the random seed as a hyperparameter to model selection process by cross validation to see which random seed can generate best out-of-sample prediction? Or is the model just not suitable to do prediction?



First, for me, it doesn’t make sense to use hyperparamter for the random seed. In general, the posterior distribution (of beta) should be a stationary distribution. If the distribution (MCMC chains) change when you change the random seed, this is not valid MCMC chains (not valid posterior distribution). There might be possible that there are multi-modals in true posterior distributions. In that case, the MCMC chain falls into each local modal when you change the random seed. You do need to check the MCMC chains. The easiest way is to check if the distributions are consistent. (instead of mean value). Also, autocorrelation plot will be helpful. In other words, it is called model is non-idetified.

Once the MCMC chain is valid, you can do posterior predictive checks.
For example, if your posterior is beta(post), your posterior predictive samples are y_tilde|beta(post), So, in each beta(post), let;s generate new y given new x. If you have 5000 posterior beta, your will have 5000 new y. And check the new y distribution and true new y.
If the true new y deviates from the generated new y, then the posterior is not credible.


Thank you so much ecosang.
I think the problem seems on the sample size. I am using SGHMC. I have around 400 samples and 13 unknown coefficients for inerence in total. If I feed the model with 150 samples. The postior precdition and some coefficients vary greatly with random seed. When I feed it with 300 samples, they are less sensitive to the random seed.

Maybe this is the case of SGHMC. The problem is still there.


I think the problem is better explained here: