In Stan, I would define a log_prob per sample as a single floating point number. The log_prob generated by all the samples collectively determines the HMC acceptance of a MC draw. This is true, even when I have a time series generating T outputs per batch sample (meaning the input and output tensor has shape [N, T, D], where D is the number of predictors or outcome variables. I would generate a single log_prob per batch sample because the log_prob of all the outcomes are merged into one number by an operation along T axis.
In Edward / Tensorflow, as I can see, each [N, T] input element generates its own log_prob, ending up with [N, T] log prob values. Do I absolutely need to generate one log_prob number per period per sample?