Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.
> we find that for 79% and 68% of LLM next-token distributions on TinyStories and Wikipedia, respectively, their top-1 predictions agree with those provided by our N-gram rulesets
Two prediction methods may have completely different mechanisms, but agree sometimes, because they are both predicting the same thing.
Seems a fairly large proportion of language can be predicted by a simpler model.. But it's the remaining percent that's the difficult part; which simple `n-gram` models are bad at, and transformers are really good at.
on topic: couldn't one in theory, re-publish this kind of paper for different kinds of LLMs, as the textual corpus upon which LLMs are built based off ultimately, at some level, human effort and human input whether it be writing, or typing?