Maximum likelihood estimation

mbernste · March 22, 2018, 7:31pm

I apologize if this question is naive, but is it possible to perform maximum likelihood estimation over some set of parameters in a model? This would be equivalent to MAP estimation with an improper prior over the parameters. If so, can one point me to an example or tutorial where this is demonstrated?

I have examined using Edward’s MAP inference for this task; however, this inference requires the parameters be encoded as a RandomVariable.

Thanks!

aksarkar · March 23, 2018, 8:42pm

MLE for the model:

x_i ~ Normal(mu, 1)

N = 100
np.random.seed(0)
mu = np.random.normal()
x = np.random.normal(loc=mu, scale=1, size=(N, 1)).astype(np.float32)

px = ed.models.Normal(loc=tf.Variable(tf.zeros([1])) * tf.ones([N, 1]), scale=tf.ones([1]))
inf = ed.KLqp(data={px: x})
inf.run()
print(mu, x.mean(), ed.get_session().run(px.mean()[0, 0]))

1.764052345967664 1.82505 1.82505

The reason this works is because the objective function in ed.KLqp includes the log-likelihood.

You’re probably better off using scipy.optimize to solve maximum likelihood problems because ed.KLqp must use gradient descent (the objective function is stochastic, so we have to estimate gradients via sampling), but MLEs can often be found quicker using second-order methods.

mbernste · March 23, 2018, 9:36pm

Thank you very much for your response. This answers my question. I am interested in the case in which I would like to approximate a posterior for the latent variables and a point estimate for the parameters.

Upon further investigation, I see that when Edward builds gradients for the loss function, it computes gradients with respect to all variables upstream from the variational random variables in the computation graph.

aksarkar · March 23, 2018, 10:09pm

Yes, that inference algorithm is referred to as VBEM in the docs, although it isn’t iterative like the original formulation (Beal 2003).

The end result should be the same (assuming convergence): a local optimum of the evidence lower bound with respect to the variational parameters and (typically) model hyperparameters.

mbernste · March 24, 2018, 4:55pm

Ah right, would it be correct to say that VBEM and this algorithm both minimize the same loss function; however VBEM uses coordinate descent whereas this algorithm performs gradient descent? Thanks again

aksarkar · March 25, 2018, 5:37pm

The fundamental idea of VBEM is that, like EM, it monotonically improves a lower bound to the objective function. (In the case of VBEM, the objective function is itself a lower bound.)

You could use any optimization algorithm for the VBE step (depending on whether you could write down the objective function analytically), and potentially a different algorithm for the VBM step.

Topic		Replies	Views
Optimization of neural network parameters and inference of latent variable	0	869	January 7, 2018
Clarification expectation maximization	0	960	November 14, 2017
Rookie problem (KLqp gets obviously wrong result)	7	1785	October 17, 2017
How to access point estimates of parameters	4	1986	March 23, 2017
A toy normal model failed (klqp) and why?	2	1689	July 25, 2017

Maximum likelihood estimation

Related topics