Hey all,

I am having issues applying SGHMC inference on a convolutional neural network, such as LeNet. We believe our LeNet architecture is valid, but after 50000 iterations, the network does not learn anything.

We tested on applying MNIST dataset on a single layer NN with SGHMC, and we got 90% after 10000 iteration with batch size of 500.

Does Edward/SGHMC work on when using convolution operation, or are we doing something wrong with our model.

Here is our notebook of training LeNet with SGHMC with MNIST dataset.

Can anyone advise/help with why SGHMC is performing? My teammate and I would really appreciate it!

1 Like

I didn’t run your code so I could be wrong but I see in the model you define the placeholder X but then you define a new placeholder x which is the one used in the inference run. I hope this helps.

I’m also trying something similar. But I am using HMC with a smaller scale dataset (about 20000 training images and 2000 testing images). I haven’t quite figure out how to get reasonable results. I am monitoring the acceptance rate returned by the hmc inference function. It suppose to be close to 1. A couple of things I noticed could change the acceptance rate of samples are: 1) initial values of the q variables i.e. the variables with Empirical distributions. 2) step_size (SGHMC also has this one); 3) n_steps (For SGHMC this is ‘friction’). By tuning the values of 1),2),3) I manage to get a reasonable acceptance rate. I also found the acceptance rate is quite sensitive to the choice of 1),2),3). However, the issue is that the model still does not learn anything meaning that if I use the samples of q variables to perform inference, the results are pretty poor.

I am not really sure how to debug this. Any further comments on getting HMC/SGHMC working on CNN are appreciated.