Hello,

I am a new user of Edward and I am currently trying to implement the BEST Bayesian testing described here and implemented on top of PyMC in here.

The method relies on fitting a t-distribution to observed data, with a normal distributed prior for the mean, uniformly distributed prior for the scale and an exponentially distributed prior for the degrees of freedom.

I was able to implement this for a simplified basecase of fitting a normal distribution, but I get very unsatisfying results for the degrees of freedom when I extend this to the t-distribution, even in a toy example.

I hope this is the right place to post this, as I have found a multitude of examples for very basic cases involving the normal distribution online, but nothing for this usecase or similar, especially when relying on sampling and not variational inference.

The following code should reproduce my issue in jupyter.

```
import edward as ed
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import tensorflow as tf
import scipy.stats as stats
from edward.models import Empirical, Exponential, Normal, Uniform, StudentTWithAbsDfSoftplusScale
`# DATA`
y_train = stats.t.rvs(10, loc=2, scale=5, size=1000, random_state=None)
print(y_train.mean())
print(y_train.std())
`# MODEL`
mu = Normal(loc=0.0, scale=5.0)
std = Uniform(low=0.01, high=10.0)
nu_minus_one = Exponential(rate=1/29)
y = StudentTWithAbsDfSoftplusScale(loc=mu, scale=std, df=nu_minus_one, sample_shape=1000)
q_mu = Empirical(params=tf.Variable(tf.zeros(5000)))
q_std = Empirical(params=tf.Variable(tf.random_uniform([5000], minval=0.01, maxval=10.0)))
q_nu = Empirical(params=tf.Variable(tf.random_uniform([5000], minval=5.0, maxval=30.0)))
`# INFERENCE`
`# inference = ed.HMC({mu: q_mu, std: q_std, nu_minus_one: q_nu}, data={y: y_train})`
`# inference.run(step_size=0.005, n_steps=5)`
`# INFERENCE`
inference = ed.SGHMC({mu: q_mu, std: q_std, nu_minus_one: q_nu}, data={y: y_train})
inference.run(step_size=1e-2, friction=0.1)
print(tf.reduce_mean(q_mu.params[1000:]).eval())
`# 1.7592`
print(tf.reduce_mean(q_std.params[1000:]).eval())
`# 5.15432`
print(tf.reduce_mean(q_nu.params[1000:]).eval())
`# 19.1388`
f, ax = plt.subplots()
f.suptitle("Trace of mu")
plt.plot(q_mu.params[1000:].eval())
f, ax = plt.subplots()
f.suptitle("Trace of std")
plt.plot(q_std.params[1000:].eval())
f, ax = plt.subplots()
f.suptitle("Trace of nu")
plt.plot(q_nu.params[1000:].eval())
f, ax = plt.subplots()
f.suptitle('Distribution of mu')
sns.distplot(q_mu.params[1000:].eval())
f, ax = plt.subplots()
f.suptitle('Distribution of std')
sns.distplot(q_std.params[1000:].eval())
f, ax = plt.subplots()
f.suptitle('Distribution of nu')
sns.distplot(q_nu.params[1000:].eval())
```

While the trace of mu and std appear satisfying, the trace as well as the distribution of nu appear unsatisfying to me.

Am I missing something? What is the proper way to set the posterior Empirical variable, given an exponentially distributed prior? Or am I interpreting the results wrongly?

Kind regards,

Momo