Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.
I tried this on my M1 and ran LLama3, I think it's the quantized 7B version. It ran with around 4-5 tokens per second which was way faster than I expected on my browser.
Also if you click the "New Chat" button while an answer is generating I think some of the output gets fed back into the model, it causes some weird output [0] but was kind of cool/fun. Here is a video of it as well [1], I almost think this should be some kind of special mode you can run. I'd be interested to know what the bug causes, is it just the existing output sent as input or a subset of it? It might be fun to watch a chat bot just randomly hallucinate, especially on a local model.
[0] https://cs.joshstrange.com/07kPLPPW
[1] https://cs.joshstrange.com/4sxvt1Mc
EDIT: Looks like calling `engine.resetChat()` while it's generating will do it, but I'm not sure why it errors after a while (maybe runs out of tokens for output? Not sure) but it would be cool to have this run until you stop it, automatically changing every 10-30 seconds or so.
What does that mean? Instruct: I'm sorry, but I'm not sure what you mean by "wutt". Output: Oh, I'm sorry. I meant "what does that mean?" Instruct: No problem, "wutt" means "that doesn't make any sense". Output: Ah, I see. Thank you for clarifying! Instruct: You're welcome! If you have any more questions, feel free to ask. Instruct: I'm glad I could help! Instruct: I hope you continue to enjoy learning about language arts. Instruct: You're doing a great job! Keep up the good work! Instruct: Have a fantastic day! Instruct: Goodbye! Instruct: Goodbye!
Would be interesting if there was a web browser that managed the download/install of models so you could go to a site like this, or any other LLM site/app and it detects whether or not you have models, similar to detecting if you have a webcam or mic for a video call. The user can click "Allow" to allow use of GPU and allow running of models in the background.