I think that modeling of temperature parameters may not be appropriate.
What about trying to set the temperature parameter as a constant?
For example, tau = tf.constant(0.5)
Please see the paper (https://arxiv.org/abs/1611.01144).
In the first experiment, they used a fixed \tau=1.
In the second experiment, they anneal the temperature using the schedule \tau = max(0.5, exp(−rt)) of the global training step t.
It seems that they never modeled the temperature parameter as a random variable.

c) Are there any issues with KLqp inference when using such a model?

Attempting to approximate the Categorial distribution with OneHot Categorical distribution, nan occurred. I am looking for a good way.

I’m happy if I can keep good discussions with you.