Custom random variable with two parts


#1

I need to write a custom random variable but it has two arrays in its outcome, not just a single number. Does the Edward customrandomvariable _log_prob(self, value) allow for that? Do I simply set value to be an tensorflow array (tensor)? How exactly would you declare that when you use the “value” variable? (maybe this is just a python or tensorflow question, but I don’t know how)?

In fact I am wondering even if it’s just a single array (or a vector) outcome, I don’t see any sample code that can handle this. How would you optimize the elbo in this case? is it automatically handled once you enter the right shape tensor?

Another question when I implemented this in stan, I only have to generate log prob, no need for samples. Why does Edward require generation of samples?


#3

_log_prob is expected to return a Tensor giving the per-data point log likelihood. (Refer to the implementations in tensorflow.contrib.distributions.)

If value has shape [n, 2], then you could return an [n, 1] Tensor.

The various implementations of ed.Inference all sum the value returned by _log_prob, so everything will work automatically.

Edward requires samples to check that tensor shapes are compatible (see ed.models.random_variable.RandomVariable.__init__). You can circumvent this by initializing with the keyword argument value.

Edward also uses samples to construct the terms in the objective function corresponding to RandomVariables in data.


#4

Hi thanks for the explanation. I don’t get that last part at all. What is meant by “corresponding to RandomVariables in data”?

There is something I don’t understand about using the value input parameter. If I supply that, it does bypass the tensor shape check, however, it also bypasses calling sample_n, so that the customer _sample_n method I write in my custom random variable class is never called. I don’t think that’s a good thing?


#5

The arguments to inference.run are data and latent_vars. I mean the keys of the dictionary passed in as data.

For example, if I have

x = tf.placeholder(...)
y = ed.models.Normal(loc=x, ...)

inf.run(data={y: y_train, x: x_train})

Edward uses samples of y to compute the objective function for BBVI.

You are correct that initializing a RandomVariable with a value means you never draw a sample.


#6

I am starting to see the real problem now. So the distributions add up the log prob of each data point.

In My model the dimensions are [N, T, 2]. Calculation of prob requires matrix multiplication along the time dimension and can not be set to log form at the element level. Log can be applied after all the elements have gone thru matrix multiplication and then multiplied to each other. It’s ok to take log prob for each batch sample (N axis), but not for each time element (T axis) in a single batch sample.

Can you think of any way to take care of this situation?


Question on log_prob and how it differs from other platforms like Stan
#7

Ok, I solved this problem by adding the log prob myself and put it in the last column.

I now have a different problem. instead of adding the log probs, I try to multiply the probs, then take the log. But this leads to nan for all parameters. I tried converting to tf.float64 for this part, but to no avail. I know this is more a tensorflow problem then Edward, but if someone is very experienced with tensorflow and can help here, I would really appreciate it.


#8

It is a floating point precision problem. I am surprised even float64 couldn’t cover it.


#9

Without a mathematical description of the likelihood, I can’t say much, but it seems like you have a model:

x_i1, ..., x_iT, y_i1, ..., y_iT ~ g(x_i, y_i), i = 1..N

where x_i, y_i are T vectors. I assume you’ve concatenated them to get an [N, T, 2] tensor.

In this case, you should be returning an [N, 1] tensor where each element is log g(x_i, y_i).

This doesn’t require taking logs for each time element, only per batch sample.


#10

you are right, that worked!


#11

I added some tf.Print statements in the sample_n method, and nothing gets printed out during execution.

is sample_n not used during training? another word, if I were debugging my output, I shouldn’t even look so hard there?


#12

_sample_n returns a Tensorflow op which gets added to the computation graph. (Refer to tensorflow/tensorflow/python/ops/distributions/distribution.py).

The function call itself is used to build the computation graph for inference, not to actually perform inference.