Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.
The claim here seems to be that it runs usefully fast on CPU.
We're not sure how accurate this claim is, because we don't know how fast this model runs on a GPU, because:
> Absent from the list of supported chips are GPUs [...]
And TFA doesn't really quantify anything, just offers: > Perhaps more impressively, BitNet b1.58 2B4T is speedier than other models of its size — in some cases, twice the speed — while using a fraction of the memory.
The model they link to is just over 1GB in size, and there's plenty of existing 1-2GB models that are quite serviceable on even a mildly-modern CPU-only rig.
https://github.com/microsoft/BitNet
"...It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption..."
https://arxiv.org/abs/2402.17764