Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.
Isn't the whole reason for chain-of-thought that the tokens sort of are the reasoning process?
Yes, there is more internal state in the model's hidden layers while it predicts the next token - but that information is gone at the end of that prediction pass. The information that is kept "between one token and the next" is really only the tokens themselves, right? So in that sense, the OP would be wrong.
Of course we don't know what kind of information the model encodes in the specific token choices - I.e. the tokens might not mean to the model what we think they mean.
I have no problem for a system to present a reasonable argument leading to a production/solution, even if that materially was not what happened in the generation process.
I'd go even further and pose that probably requiring the "explanation" to be not just congruent but identical with the production would either lead to incomprehensible justifications or severely limited production systems.
CoT improves results, sure. And part of that is probably because you are telling the LLM to add more things to the context window, which increases the potential of resolving some syllogism in the training data: One inference cycle tells you that "man" has something to do with "mortal" and "Socrates" has something to do with "man", but two cycles will spit those both into the context window and lets you get statistically closer to "Socrates" having something to do with "mortal". But given that the training/RLHF for CoT revolves around generating long chains of human-readable "steps", it can't really be explanatory for a process which is essentially statistical.