Edward limitations compared to Pyro?

Hello. If we considere this block of code:

true_param1=[1.0,5.0,15.0]
data = Categorical(probs=true_param1,sample_shape=[1,5000])

param1 = Dirichlet(tf.constant([1.0,1.0,1.0]), name='param1')
qparam1 = Dirichlet(tf.nn.softplus(tf.Variable(tf.constant([1.0,1.0,1.0])), name="qparam1"))

w = Categorical(probs=param1, sample_shape=[1,5000],name="w")

inference = ed.KLqp({param1: qparam1}, data={w: data})
inference.run(n_iter=2000)
print(session.run(qparam1.mean()))

That works. But now if we change just a little bit, the 5th line, by passing param1 in a function :

w = Categorical(probs=funct(param1), sample_shape=[1,5000],name=“w”)

In a lot of cases the inference won’t work. If we define for example:

funct(param1):
    return param1.eval()

That will not infer correctly the toy example [1.0,5.0,15.0], instead we have [1/3,1/3,1/3]…

I know that param1 and param1.eval() are not the same type (ed.RV and np.array).

Is that meaning that I can’t do anything on param1? Except basic operations as “return 2*param1” ?

Although in my case funct have to return something more complex than this : a tensor which is result of convolution implying tf.constant(param1.eval()) and other tensors, but how am I supposed to do this as the inference is totally false using just a basic “return param1.eval()” or “return tf.constant(param1.eval())” ?

With Pyro I did it quickly but I don’t know how to manage with Edward/tf…

Here an illustration of what I am saying:

Create data:

true_param1=np.zeros((1,1,2,10))
true_param1[0,0,0,2]=1
true_param1[0,0,1,5]=1
true_param1 = tf.constant(true_param1, dtype=tf.float32)

true_param2 = np.zeros((1,5,2,2))
A=np.array([[1,1,0,0,0],[0,4,30,1,0]])
B=np.array([[2,1,0,0,3],[4,5,0,0,0]])
true_param2[0,:,:,0]=np.transpose(A)
true_param2[0,:,:,1]=np.transpose(B)
true_param2 = tf.constant(true_param2, dtype=tf.float32)

latent = tf.nn.conv2d(true_param2,true_param1,strides=[1,1,1,1],padding="VALID")
latent = tf.reduce_sum(latent, 3)
latent = tf.squeeze(latent,[0])
latent = tf.reshape(latent,[10])
data = Categorical(probs=latent, sample_shape=[1,10000])

Model:

param1 = Dirichlet(tf.ones([1,1,2,10]), name='param1')
qparam1 = Dirichlet(tf.nn.softplus(tf.Variable(tf.ones([1,1,2,10]))), name="qparam1") 

def model_latent(param1):
    latent = tf.nn.conv2d(true_param2,param1,strides=[1,1,1,1],padding="VALID")
    latent = tf.reduce_sum(latent, 3)
    latent = tf.squeeze(latent,[0])
    latent = tf.reshape(latent,[10])
    return latent

w = Categorical(probs=model_latent(param1), sample_shape=[1,10000])

Inference:

inference = ed.KLqp({param1: qparam1}, data={w: data})
inference.initialize()
inference.n_iter = 1000

tf.global_variables_initializer().run()

for _ in range(inference.n_iter):
    info_dict = inference.update()
    inference.print_progress(info_dict)
    print(session.run(qparam1.mean()))
    print("______________")
    
inference.finalize()

qparam1 has to converge towards a tensor of zeros almost everywhere as true_param1, but instead we obtain senseless values

Your function of param1 can be arbitrarily complicated, but it has to return a tensor. So anything involving eval won’t work (and there is no reason you should ever have to call it: the “current” value of a tensor x is accessed as x).

Thre reason your second example doesn’t work is because latent is not a valid tensor of probabilities (doesn’t sum to 1).

Hello aksarkar and thanks you for your answer.

The reason why I wanted to do it is to do convolution manually: I was trying to use numpy functions (as with tensorflow item assigment is forbidden, and that’s one of the reasons why I talked about “limitations” though this is more related with tf rather than edward itself) and then I obtain the error “setting an array element with a sequence.” if I don’t access the “current” value of a tensor x by x.eval().

Whatever I can do the convolution by using the built-in function tf.nn.conv2d as in my second example.
I added

latent = latent/tf.norm(latent,1)

so that latent sum up to 1 but nothing changes, values of qparam1.mean() are not good (and now I obtain nan values after ~200 iterations).

By the way i’m not sure that the probabilities have to sum up to one in general case, I did some toy examples with Categorical where inference work without having the probability tensors summing up to 1.

Some help would be great.

By the way the function can’t be arbitrarily complicated as long as that return a tensor as you say.
If we switch back to example one and use

def funct(param1):
    return tf.constant(param1.eval())

that doesn’t infer correctly true_param1, (qparam1 converges toward the prior param1 instead of true_param1).

Another toy example where I have again this problem of bad inference when latent is a function of parameter (here a simple addition), tough it works if there is no addition:

true_pi1 = tf.constant([0.1,0.2,0.0,0.2])
true_pi2 = tf.constant([0.3,0.0,0.1,0.1])

pi1 = Dirichlet(tf.ones(4))
qpi1 = Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(4))), name="qpi1")

z_data=Categorical(probs=true_pi1+true_pi2, sample_shape=[1,2000])
z = Categorical(probs=pi1+true_pi2, sample_shape=[1,2000])

inference = ed.KLqp({pi1: qpi1}, data={z: z_data})
inference.initialize()
inference.n_iter = 1000

tf.global_variables_initializer().run()

for _ in range(inference.n_iter):
    info_dict = inference.update()
    inference.print_progress(info_dict)
    print(session.run(qpi1.mean()))

inference.finalize()

If someone have an idea.