Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.
> Unlike existing methods that often depend on multi-stage intermediate pre-training with billions of weakly-supervised text pairs, followed by fine-tuning with a few labeled datasets, our method does not require building complex training pipelines or relying on manually collected datasets… We leverage proprietary LLMs to generate diverse synthetic data for hundreds of thousands of text embedding tasks across nearly 100 languages.
It’s true that ultimately there’s a judgement call (what does “distance” mean?), but I think the original post far overcomplicates what’s standard practice today.
Understanding audio with inputs in the frequency domain isn’t required for understanding frequencies in audio.
A large enough system with sufficient training data would definitely be able to come up with a Fourier transform (or something resembling one), if encoding it helped the loss go down.
> In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.
Today’s diffusion models learn representations from raw pixels, without even the concept of convolutions.
Ditto for language - as long as the architecture is 1) capable of modeling long range dependencies and 2) can be scaled reasonably, whether you pass in tokens, individual characters, or raw ASCII bytes is irrelevant. Character based models perform just as well (or better than) token/word level models at a given parameter count/training corpus size - the main reason they aren’t common (yet) is due to memory limitations, not anything fundamental.
For further reading, I’d recommend literature on transformer circuits for learning arithmetic without axioms: https://www.lesswrong.com/posts/CJsxd8ofLjGFxkmAP/explaining...