Hidden auto regressive process example

Hej, I am slowly getting used to Edward and building a Toy example which hints to my actual problem.
The inference works (although not very accurate), but there are some things I am not sure about when building my model. When I set my priors correctly obviously the inference improves.

(1) Is it possible to just drown the model with data and thus overrule the prior?

Here is what I want to do:
I am trying to infere a AR process which is hidden behind some other process.

First, I generate some data and build an AR process

# Parameters
mu_true = 0.
beta_true = np.array([0.6])
noise_obs = 0.3
T = 128*2
p = len(beta_true)
x_true = np.random.randn(T+1)*noise_obs
for t in range(p, T):
    x_true[t] += beta_true.dot(x_true[t-p:t][::-1])

The AR process is then mixed with the other random process, sequence of random 1 and -1

constellation = np.array([-1 , 1])
M = len(constellation)
iX = np.random.randint(0,M,T)
ssX_true = constellation[iX];
rrX_true = ssX_true+x_true[p:]*ssX_true

Now, I have to build a model for this

# Prior on AR process parameters
mu = Normal(loc=0., scale=0.1) # low std
beta = [Normal(loc=0.5, scale=0.5) for i in range(p)]
noise_proc = tf.constant(noise_obs) # InverseGamma(concentration=1.0, rate=1.0)
x = [0] * T; ssX = [0] * T; rrX = [0] * T

Connect the AR process

for n in range(p):
    x[n] = Normal(loc=mu, scale=10.0)  # fat prior on first p values of x
# Connect AR process
for n in range(p, T):
    mu_ = mu
    for j in range(p):
        mu_ += beta[j] * x[n-j-1]
    x[n] = Normal(loc=mu_, scale=noise_proc)
print('AR(p) connected')

And now the model for the mixing process, this is where I have the most doubts, see (2), (3) and (4) in the code

# Connect mixing process
for n in range(T):
    # ssX[n] = Normal(loc=0.0, scale=10.0) # (2) is this correct given that ssX_true is not sampled from a Normal distribution
    ssX[n] = tf.constant(ssX_true[n],dtype=tf.float32) # (2) rather set constant like this?
    # (1) or rather tf.Variable(...,trainable=false) ?
    
    rrX[n] = Normal(loc=ssX[n]+x[n]*ssX[n], scale=1.0)# (3) Is this correct? Or do I need a "TransformedDistribution"?
    # (4) I set scale=1.0, could I also set it to noise_proc since that is the variance of x[n]?
print('pseudo channel connected')

And then I run the inference like this

print("setting up distributions")
qmu = PointMass(params=tf.Variable(0.))
qbeta = [PointMass(params=tf.Variable(0.)) for i in range(p)]
qx = [PointMass(params=tf.Variable(0.)) for i in range(T)]
print("constructing inference object")
vdict = {mu: qmu}
vdict.update({b: qb for b, qb in zip(beta, qbeta)})
vdict.update({xt: xt_true for xt, xt_true in zip(x, qx)})
ddict = {xt: xt_true for xt, xt_true in zip(rrX, rrX_true)}
# ddict.update({xt: xt_true for xt, xt_true in zip(ssX, ssX_true)}) # if ssX[n] not set to constant
inference = ed.MAP(vdict, data=ddict)
print("running inference")
inference.run()

print("parameter estimates:")
print("beta: ", [qb.value().eval() for qb in qbeta])
print("mu: ", qmu.value().eval())

which gives me this output:

setting up distributions
constructing inference object
running inference
1000/1000 [100%] ██████████████████████████████ Elapsed: 145s | Loss: 175.976
parameter estimates:
beta:  [0.94732594]
mu:  -0.00536023

beta is still way off, but when I look at the plot (of the code below), it suggests that my model actually learned something. (5) But how can I improve this estimate?

x_estimate = [x.value().eval() for x in qx]
plt.plot(x_true);
plt.plot(x_estimate);
plt.show()

download

(6) I am also slowly trying to move to another inference algorithm, would HMC make any sense?

My following “HMC” approach never finishes

TT = 10

print("setting up empirical variables")
qmu = Empirical(params=tf.Variable(tf.zeros(TT)))
qbeta = [Empirical(params=tf.Variable(tf.zeros(TT))) for i in range(p)]
qx = [Empirical(params=tf.Variable(tf.zeros(TT))) for i in range(T)]
print("constructing inference object")
vdict = {mu: qmu}
vdict.update({b: qb for b, qb in zip(beta, qbeta)})
vdict.update({xt: xt_true for xt, xt_true in zip(x, qx)})
ddict = {xt: xt_true for xt, xt_true in zip(rrX, rrX_true)}
inference_hmc = ed.HMC(vdict, data=ddict)
print("running inference")
inference_hmc.run(step_size=0.5 / N, n_steps=10)

Here is a whole jupyter notebook, without my inline questions, but plots that show the data and the inference results of the AR process.