Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.


⬅️ You could have designed state of the art positional encoding
majke 1 hoursReload
The best explanation of classic positional encoding I found in this blog

rgovostes 7 daysReload
Thanks to the author for clarifying something that's been a mystery to me for a few years. The positional encoding scheme in the "Attention Is All You Need" paper is only given half a page and the construction appears to come out of nowhere.

valine 7 daysReload
One of the things I really love about rope is that it allows for a lot of interesting encoding schemes at inference time without model retraining. I’ve had a lot of fun playing with different relative positions. You can elicit a lot of interesting behaviors from the model when you use different rotations for keys vs queries, they don’t always have to match.

For example exact position doesn’t matter too much when tokens are spaced out. Let’s say you use token position 100 for your query, you can shift all the keys around position 100, and the further they are back in the context the more freedom you have to play with the value.

espadrine 7 daysReload
> Furthermore, by rotating the vector, we have absolutely zero impact on the norm of the vector, which encodes the semantic information of our token.

Doesn’t the angle encode semantic information? Cosine similarity works for embeddings after all.

elieb44 7 daysReload
How about context encoding more generally ? Are there techniques to do that. I.E, during training, I want the string "Dubito ergo cogito, cogito ergo sum, sum ergo Deus est." to have embedded René Descartes as main author, year 1637 as date of writing and "Discours de la méthode" as global context of writing.

So that when trained again another part of the same book, the model can learn they were from same context.