Edward limitations compared to Pyro?


#1

Hello. If we considere this block of code:

true_param1=[1.0,5.0,15.0]
data = Categorical(probs=true_param1,sample_shape=[1,5000])

param1 = Dirichlet(tf.constant([1.0,1.0,1.0]), name='param1')
qparam1 = Dirichlet(tf.nn.softplus(tf.Variable(tf.constant([1.0,1.0,1.0])), name="qparam1"))

w = Categorical(probs=param1, sample_shape=[1,5000],name="w")

inference = ed.KLqp({param1: qparam1}, data={w: data})
inference.run(n_iter=2000)
print(session.run(qparam1.mean()))

That works. But now if we change just a little bit, the 5th line, by passing param1 in a function :

w = Categorical(probs=funct(param1), sample_shape=[1,5000],name=“w”)

In a lot of cases the inference won’t work. If we define for example:

funct(param1):
    return param1.eval()

That will not infer correctly the toy example [1.0,5.0,15.0], instead we have [1/3,1/3,1/3]…

I know that param1 and param1.eval() are not the same type (ed.RV and np.array).

Is that meaning that I can’t do anything on param1? Except basic operations as “return 2*param1” ?

Although in my case funct have to return something more complex than this : a tensor which is result of convolution implying tf.constant(param1.eval()) and other tensors, but how am I supposed to do this as the inference is totally false using just a basic “return param1.eval()” or “return tf.constant(param1.eval())” ?

With Pyro I did it quickly but I don’t know how to manage with Edward/tf…


#2

Here an illustration of what I am saying:

Create data:

true_param1=np.zeros((1,1,2,10))
true_param1[0,0,0,2]=1
true_param1[0,0,1,5]=1
true_param1 = tf.constant(true_param1, dtype=tf.float32)

true_param2 = np.zeros((1,5,2,2))
A=np.array([[1,1,0,0,0],[0,4,30,1,0]])
B=np.array([[2,1,0,0,3],[4,5,0,0,0]])
true_param2[0,:,:,0]=np.transpose(A)
true_param2[0,:,:,1]=np.transpose(B)
true_param2 = tf.constant(true_param2, dtype=tf.float32)

latent = tf.nn.conv2d(true_param2,true_param1,strides=[1,1,1,1],padding="VALID")
latent = tf.reduce_sum(latent, 3)
latent = tf.squeeze(latent,[0])
latent = tf.reshape(latent,[10])
data = Categorical(probs=latent, sample_shape=[1,10000])

Model:

param1 = Dirichlet(tf.ones([1,1,2,10]), name='param1')
qparam1 = Dirichlet(tf.nn.softplus(tf.Variable(tf.ones([1,1,2,10]))), name="qparam1") 

def model_latent(param1):
    latent = tf.nn.conv2d(true_param2,param1,strides=[1,1,1,1],padding="VALID")
    latent = tf.reduce_sum(latent, 3)
    latent = tf.squeeze(latent,[0])
    latent = tf.reshape(latent,[10])
    return latent

w = Categorical(probs=model_latent(param1), sample_shape=[1,10000])

Inference:

inference = ed.KLqp({param1: qparam1}, data={w: data})
inference.initialize()
inference.n_iter = 1000

tf.global_variables_initializer().run()

for _ in range(inference.n_iter):
    info_dict = inference.update()
    inference.print_progress(info_dict)
    print(session.run(qparam1.mean()))
    print("______________")
    
inference.finalize()

qparam1 has to converge towards a tensor of zeros almost everywhere as true_param1, but instead we obtain senseless values


#3

Your function of param1 can be arbitrarily complicated, but it has to return a tensor. So anything involving eval won’t work (and there is no reason you should ever have to call it: the “current” value of a tensor x is accessed as x).

Thre reason your second example doesn’t work is because latent is not a valid tensor of probabilities (doesn’t sum to 1).


#4

Hello aksarkar and thanks you for your answer.

The reason why I wanted to do it is to do convolution manually: I was trying to use numpy functions (as with tensorflow item assigment is forbidden, and that’s one of the reasons why I talked about “limitations” though this is more related with tf rather than edward itself) and then I obtain the error “setting an array element with a sequence.” if I don’t access the “current” value of a tensor x by x.eval().

Whatever I can do the convolution by using the built-in function tf.nn.conv2d as in my second example.
I added

latent = latent/tf.norm(latent,1)

so that latent sum up to 1 but nothing changes, values of qparam1.mean() are not good (and now I obtain nan values after ~200 iterations).

By the way i’m not sure that the probabilities have to sum up to one in general case, I did some toy examples with Categorical where inference work without having the probability tensors summing up to 1.

Some help would be great.


#5

By the way the function can’t be arbitrarily complicated as long as that return a tensor as you say.
If we switch back to example one and use

def funct(param1):
    return tf.constant(param1.eval())

that doesn’t infer correctly true_param1, (qparam1 converges toward the prior param1 instead of true_param1).


#6

Another toy example where I have again this problem of bad inference when latent is a function of parameter (here a simple addition), tough it works if there is no addition:

true_pi1 = tf.constant([0.1,0.2,0.0,0.2])
true_pi2 = tf.constant([0.3,0.0,0.1,0.1])

pi1 = Dirichlet(tf.ones(4))
qpi1 = Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(4))), name="qpi1")

z_data=Categorical(probs=true_pi1+true_pi2, sample_shape=[1,2000])
z = Categorical(probs=pi1+true_pi2, sample_shape=[1,2000])

inference = ed.KLqp({pi1: qpi1}, data={z: z_data})
inference.initialize()
inference.n_iter = 1000

tf.global_variables_initializer().run()

for _ in range(inference.n_iter):
    info_dict = inference.update()
    inference.print_progress(info_dict)
    print(session.run(qpi1.mean()))

inference.finalize()

If someone have an idea.