Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.
> we find that for 79% and 68% of LLM next-token distributions on TinyStories and Wikipedia, respectively, their top-1 predictions agree with those provided by our N-gram rulesets
Two prediction methods may have completely different mechanisms, but agree sometimes, because they are both predicting the same thing.
Seems a fairly large proportion of language can be predicted by a simpler model.. But it's the remaining percent that's the difficult part; which simple `n-gram` models are bad at, and transformers are really good at.
https://neurips.cc/virtual/2024/poster/94849
The underlying data has been open sourced as discussed on his blog here https://timothynguyen.org/2024/11/07/open-sourced-my-work-on...