Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.

Source:https://github.com/SoraKumo001/next-streaming

⬅️ Understanding Transformers via N-gram Statistics

pona-a 2 hoursReload

I wonder if these N-gram reduced models, augmented with confidence measures, can act as a very fast speculative decoder. Or maybe the sheer number of explicit rules unfolded from the compressed latent representation will make it impractical.

montebicyclelo 3 hoursReload

> The results we obtained in Section 7 imply that, at least on simple datasets like TinyStories and Wikipedia, LLM predictions contain much quantifiable structure insofar that they often can be described in terms of our simple statistical rules

> we find that for 79% and 68% of LLM next-token distributions on TinyStories and Wikipedia, respectively, their top-1 predictions agree with those provided by our N-gram rulesets

Two prediction methods may have completely different mechanisms, but agree sometimes, because they are both predicting the same thing.

Seems a fairly large proportion of language can be predicted by a simpler model.. But it's the remaining percent that's the difficult part; which simple `n-gram` models are bad at, and transformers are really good at.

2 hoursReload

maz1b 6 hoursReload

How does this have 74 points and only one comment?

on topic: couldn't one in theory, re-publish this kind of paper for different kinds of LLMs, as the textual corpus upon which LLMs are built based off ultimately, at some level, human effort and human input whether it be writing, or typing?

10 hoursReload