Customising the MDN loss function

paul · July 17, 2017, 3:50am

Hi,

I’m successfully using a Mixture Density Network, based on the MDN tutorial code. However, I’d like to modify the loss function, to try to get different behavior from the network. For example, I’d like the model to prefer more Gaussians, with larger weights and smaller standard deviations, rather than few Gaussians with larger standard deviations.

I think I could accomplish this by adding one or more regularization terms to the loss function. However, the existing maximum likelihood loss appears to be hidden by the MAP inference method.

Can anyone suggest a good way to do this?

dustin · July 17, 2017, 7:20am

Once you’ve decided on MAP (with gradient descent) as the inference algorithm, the only thing you can change is the model. In particular, you might try increasing the number of mixture components, constraining the minimum of the standard deviations, or writing manual networks for which you can place priors to penalize the weights.

paul · July 17, 2017, 11:32pm

Thanks for the response.

How would I do something like this? The only thing that comes to mind is Tensorflow clipping operators, but I don’t think that would supply hard constraints.

Ok, can I somehow modify the network to provide a prior for the scales (or weights)? Would I need to use a different inference algorithm if I did this?

dustin · July 20, 2017, 9:39am

For example:

qz = Normal(loc=tf.Variable(tf.zeros([10])),
            scale=tf.maximum(scale, 0.01))

You would use the same algorithm. You need to rewrite the model, where you don’t rely on high-level wrappers to write the neural net layers. Instead you write weights with priors such as in the getting started example.

paul · August 14, 2017, 9:52pm

Hi Dustin,

Thanks for your response. I think I understand the idea of building a manual NN with priors on the weights.

However, I’d just like to put a prior on the outputs of that NN (e.g. the scales or logits). For example, if I could say that the scales are from some gamma distribution, which has high density for small values.

Is it possible to do this, or would I still need to build a manual NN with priors on the weights?

(I’m finding this meshing of neural networks and probabilistic models very confusing… would it be proper to cast this MDN as a form of VAE, where the NN is the inference network, and generation network is simply sampling from the mixture?)

dustin · August 14, 2017, 10:35pm

A MDN has all likelihood parameters be outputted by a neural network. What you’re describing is a neural network outputting, say, the location parameter but not the scale parameter of a normal distribution.

y = Normal(loc=neural_network(X), scale=scale)

You can specify scale with a LogNormal or Gamma prior.

paul · August 14, 2017, 11:34pm

Ok, thanks for the clarification. What you’re saying makes perfect sense!

Just thinking about the MDN as a VAE… could I simply replace the VAE’s single z Normal distribution, with a mixture of Normal distributions (parameterised like an MDN)? Would it just work?

dustin · August 15, 2017, 12:09am

Yes. Also see works on mixture priors for VAEs.

Topic		Replies	Views
Loss are NaN when using KLqp or Bayesian by Backpropagation	0	539	March 31, 2018
Loss are NaN when using KLqp or Bayesian by Backpropagation	2	1440	May 5, 2018
L2 regularization of weights	2	1006	August 21, 2018
Neural Network link function	1	829	June 3, 2018
Density Network over multiple parameters	0	670	January 3, 2018

Customising the MDN loss function

Related Topics