Unrelated variable definitions affecting fit performance

DrewN · February 1, 2018, 10:27pm

Hello, this is my first post! I’m a newcomer to Edward and to a lesser extent Tensorflow, so I’m not sure if this is a basic tensorflow question or an edward question.

In the supervised regression tutorial, a typical MSE and MAE number on the test data is are 0.03 and 0.123 respectively. The variational distributions for the latents are defined exactly as follows:

qw = Normal(loc=tf.get_variable("qw/loc", [D]),
            scale=tf.nn.softplus(tf.get_variable("qw/scale", [D])))
qb = Normal(loc=tf.get_variable("qb/loc", [1]),
            scale=tf.nn.softplus(tf.get_variable("qb/scale", [1])))

(This is different from what’s in the tutorial link. Rather, this definition comes from cell 4 here, which is an interactive version of the tutorial with minor changes; I think the main difference here is that without an initializer, it’s initialized to the glorot uniform.)

However, weirdly enough, you can improve the MSE and MAE to 0.00572143 and 0.0651105 respectively by using the following cell instead of that one:

with tf.variable_scope("scope" 
                       ,reuse=tf.AUTO_REUSE
                      ):
    qwa = Normal(loc=tf.get_variable("qw/loc", [D]),
                scale=tf.nn.softplus(tf.get_variable("qw/scale", [D])))
    qba = Normal(loc=tf.get_variable("qb/loc", [1]),
                scale=tf.nn.softplus(tf.get_variable("qb/scale", [1])))
    qw = Normal(loc=tf.get_variable("qw/loc", [D]),
                scale=tf.square(tf.get_variable("qw/scale", [D])))
    qb = Normal(loc=tf.get_variable("qb/loc", [1]),
                scale=tf.square(tf.get_variable("qb/scale", [1])))
    
#improves mse for some reason

What changed is that I define some unrelated variables using softplus in the scale, but squaring in the actual latents the inference takes as input—namely the dict {w:qw, b:qb}. However, if you comment out the definitions of qwa and qwb, this improvement does not occur!! Something about the unrelated qwa and qwb is doing something funky here.

You can also make the MSE and MAE worse by using this other cell:

with tf.variable_scope("scope" 
                       ,reuse=tf.AUTO_REUSE
                      ):
    qwa = Normal(loc=tf.get_variable("qw/loc", [D]),
                scale=tf.nn.softplus(tf.get_variable("qw/scale", [D])))
    qba = Normal(loc=tf.get_variable("qb/loc", [1]),
                scale=tf.nn.softplus(tf.get_variable("qb/scale", [1])))
    qw = Normal(loc=tf.get_variable("qw/loc2", [D]),
                scale=tf.square(tf.get_variable("qw/scale2", [D])))
    qb = Normal(loc=tf.get_variable("qb/loc2", [1]),
                scale=tf.square(tf.get_variable("qb/scale2", [1])))

which approximately doubles both the MSE and MAE. (Only the names are changed in get_variable.)

What is happening? Is it something strange with the tensorflow graph that I’m not understanding?

dustin · February 5, 2018, 9:36pm

Can you verify that the different graph defs aren’t just changing around the random seeds for running the algorithm? Namely, I’m wonder how MSE and MAE change if you ensure all scripts run to convergence.

Topic		Replies	Views
First Edward program	1	1003	October 5, 2018
Saving Model Parameters	16	6412	July 17, 2017
Fixed scales in regression example	6	1473	August 29, 2017
How to initialize `inference` object's internal Variables	2	1777	October 18, 2017
Parameters not getting updated	4	951	May 11, 2017

Unrelated variable definitions affecting fit performance

Related topics