Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.

Source:https://github.com/SoraKumo001/next-streaming

⬅️ XLSTM: Extended Long Short-Term Memory

albertzeyer 12 daysReload

It seems Sepp Hochreiter has talked already about this model since Oct 2023: https://github.com/huggingface/transformers/issues/27011

In the scaling law comparison, I wonder if it is reasonable to compare number of parameters between Llama, Mamba, RWKV, xLSTM? Isn't compute time more relevant? E.g. in the figure about scaling laws, replace num of params by compute time.

Specifically, the sLSTM has still recurrence (memory mixing) in it, i.e. you cannot fully parallelize the computation. So scaling up Transformer could still look better when you look at compute time.

It seems neither the code nor the model params are released. I wonder if that will follow.

KhoomeiK 12 daysReload

For those who don't know, the senior author on this paper (Sepp Hochreiter) was the first author on the original paper with Schmidhuber introducing LSTMs in 1997.

WithinReason 12 daysReload

I like the color coded equations, I wish they would become a thing. We have syntax highlighting for programming languages, it's time we have it for math too.

GistNoesis 12 daysReload

Can someone explain the economics behind this ?

The claim is something than will replace the transformer, a technology powering a good chunk of AI companies.

The paper's authors seems to be either from a public university, or Sepp Hochreiter's private company or labs nx-ai.com https://www.nx-ai.com/en/xlstm

Where is the code ? What is the license ? How are they earning money ? Why publish their secret recipe ? Will they not be replicated ? How will the rewards be commensurate with the value their algorithm bring ? Who will get money from this new technology ?

smusamashah 11 daysReload

Can someone ELI5 this? Reading comments it sounds like it's going to replace transformers which LLMs are based on? Is it something exponentially better than current tech on scale?