Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.
The claim is something than will replace the transformer, a technology powering a good chunk of AI companies.
The paper's authors seems to be either from a public university, or Sepp Hochreiter's private company or labs nx-ai.com https://www.nx-ai.com/en/xlstm
Where is the code ? What is the license ? How are they earning money ? Why publish their secret recipe ? Will they not be replicated ? How will the rewards be commensurate with the value their algorithm bring ? Who will get money from this new technology ?
In the scaling law comparison, I wonder if it is reasonable to compare number of parameters between Llama, Mamba, RWKV, xLSTM? Isn't compute time more relevant? E.g. in the figure about scaling laws, replace num of params by compute time.
Specifically, the sLSTM has still recurrence (memory mixing) in it, i.e. you cannot fully parallelize the computation. So scaling up Transformer could still look better when you look at compute time.
It seems neither the code nor the model params are released. I wonder if that will follow.