Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.

freeone3000 7 daysReload

I find it very interesting that “aligning with human desires” somehow includes prevention of a human trying to bypass the safeguards to generate “objectionable” content (whatever that is). I think the “safeguards” are a bigger problem with aligning with my desires.

ipython 7 daysReload

It concerns me that these defensive techniques themselves often require even more llm inference calls.

Just skimmed the GitHub repo for this one and the read me mentions four additional llm inferences for each incoming request - so now we’ve 5x’ed the (already expensive) compute required to answer a query?

padolsey 7 daysReload

So basically this just adds random characters to input prompts to break jailbreaking attempts? IMHO If you can't make a single-inference solution, you may as well just run a couple of output filters, no? That appeared to have reasonable results, and if you make such filtering more domain-specific, you'll probably make it even better. Intuition says there's no "general solution" to jailbreaking, so maybe it's a lost cause and we need to build up layers of obscurity, of which smooth-llm is just one part.

mapmeld 7 daysReload

There are some authors in common with a more recent paper "Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing" https://arxiv.org/abs/2402.16192

DevotedZest 5 daysReload

Your article is so useful for us,thanks for sharing. Good stuff! https://www.onebloodrewards.me/