Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.
Francois Chollet's paper on measuring intelligence was really informative for me on this front; the "priors" you should have about the world are not half-cauchys over certain hyperparameters or whatever, but priors about agent-ness, object-ness, goal-oriented-ness, and so on. How to encode that in a network...well, that's the real trick, right?
mixture density networks are quite interesting if you want probabilistic estimates of neural. here, your model learns to output and array of gaussian distribution coefficient distributions, and mixture weights.
these weights are specific to individual observations, and trained to maximise likelihood.
If you want UQ, 'frequentist nonparametric' approaches like Conformal Prediction and Calibration/Multi-Calibration methods seem to work quite well (especilly when combined with the standard ML machinery of taking a log-likelihood as your loss), and do not suffer from any of the issues above while also giving you formal guarantees of correctness. They are a strict improvement over Bayesian NNs, IMO.