How to handle the local optimum issue of inference method?

edisoncruise · August 8, 2017, 6:54am

I have tried the KLqp and SGHMC methods for inference. They work well in the linear regression problems as in the tutorial. However, when I feed with practical noisy data, the result of both method becomes very sensitive to the random seed. They are trapped in the local optimum.

edisoncruise · August 8, 2017, 9:42am

How about I take, for example, 100 random seed, and then select the model with best out-of-sample prediction? Is this a practicl way to solve this problem?

ecosang · August 11, 2017, 2:36am

In this case, the model might not be the best to explain the data. However, if you still want to find the best local optimal, you can use cross validation or similar predictive accuracy metrics.
In probablistic model, you can use Waic or loo-cv metrics. https://arxiv.org/abs/1507.04544
If you have samples and loglikelihood conditioned on mcmc samples, the python code will be provided by the authors. You can see BDA textbook from prof. Gelman too.

https://pymc-devs.github.io/pymc3/notebooks/GLM-model-selection.html

edisoncruise · September 1, 2017, 8:07am

Thank you all, but I hope to ask a further question.

I am using Bayesian linear regression model y= beta*x +alpha with inference by Markov Chain Monte Carlo (MCMC). I use different random seed to start MCMC, I found that the mean of regression coefficients vector beta are close, relative error about 3%. However, if I want to calculate the probability P{y_new<0|x_new, beta, alpha} with out-of-sample data, the results can be 0.39, 0.49, 0.50, 0.55, etc varies with random seeds 0,1,2,3. The regression coefficients seem not sensitive to the random seed, but my prediction result is.

So is it appropriate to add the random seed as a hyperparameter to model selection process by cross validation to see which random seed can generate best out-of-sample prediction? Or is the model just not suitable to do prediction?

ecosang · September 2, 2017, 4:25pm

Hello,

First, for me, it doesn’t make sense to use hyperparamter for the random seed. In general, the posterior distribution (of beta) should be a stationary distribution. If the distribution (MCMC chains) change when you change the random seed, this is not valid MCMC chains (not valid posterior distribution). There might be possible that there are multi-modals in true posterior distributions. In that case, the MCMC chain falls into each local modal when you change the random seed. You do need to check the MCMC chains. The easiest way is to check if the distributions are consistent. (instead of mean value). Also, autocorrelation plot will be helpful. In other words, it is called model is non-idetified.

Once the MCMC chain is valid, you can do posterior predictive checks.
For example, if your posterior is beta(post), your posterior predictive samples are y_tilde|beta(post), So, in each beta(post), let;s generate new y given new x. If you have 5000 posterior beta, your will have 5000 new y. And check the new y distribution and true new y.
If the true new y deviates from the generated new y, then the posterior is not credible.

edisoncruise · September 13, 2017, 9:12am

Thank you so much ecosang.
I think the problem seems on the sample size. I am using SGHMC. I have around 400 samples and 13 unknown coefficients for inerence in total. If I feed the model with 150 samples. The postior precdition and some coefficients vary greatly with random seed. When I feed it with 300 samples, they are less sensitive to the random seed.

Maybe this is the case of SGHMC. The problem is still there.

edisoncruise · September 15, 2017, 2:47am

I think the problem is better explained here:

http://docs.pymc.io/notebooks/variational_api_quickstart.html

Topic		Replies	Views
How to use Bayesian linear regression result to make classification in practice?	1	953	August 11, 2017
Metropolis-Hastings inference of local variables	0	726	February 2, 2018
How to imporve my model based on this PPC results?	1	803	August 28, 2017
Prediction or Criticism of the model mean()	5	1444	July 11, 2018
Why is there significant difference between these two methods in prediction?	1	569	August 1, 2017

How to handle the local optimum issue of inference method?

Related Topics