yunyu 3 daysReload

Show HN: Physically accurate black hole simulation using your iPhone camera

yunyu 68 daysReload

Score of 6. Started and sold a YC startup

yunyu 289 daysReload

Congrats on launching – we just spent quite a bit of time replicating/transforming our primary database into clickhouse for OLAP use cases, and it would have been way easier if there were a postgres-native solution. Hoping the managed hosting providers catch on

yunyu 314 daysReload

The above is a common anthropocentric take that has been repeatedly disproven by the last decade of deep learning research: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

Understanding audio with inputs in the frequency domain isn’t required for understanding frequencies in audio.

A large enough system with sufficient training data would definitely be able to come up with a Fourier transform (or something resembling one), if encoding it helped the loss go down.

> In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.

Today’s diffusion models learn representations from raw pixels, without even the concept of convolutions.

Ditto for language - as long as the architecture is 1) capable of modeling long range dependencies and 2) can be scaled reasonably, whether you pass in tokens, individual characters, or raw ASCII bytes is irrelevant. Character based models perform just as well (or better than) token/word level models at a given parameter count/training corpus size - the main reason they aren’t common (yet) is due to memory limitations, not anything fundamental.

For further reading, I’d recommend literature on transformer circuits for learning arithmetic without axioms: https://www.lesswrong.com/posts/CJsxd8ofLjGFxkmAP/explaining...

yunyu 314 daysReload

I agree with your point at the highest (pretrained model architect) level, but tokenization/encoding things into the frequency domain are decisions that typically aren’t made (or thought of) by the model consumer. They’re also not strictly theoretically necessary and are artifacts of current compute limitations. Btw E5 != E5 Mistral, the latter achieves SOTA performance without any labeled data - all you need is a prompt to generate synthetic data for your particular similarity metric.

> Unlike existing methods that often depend on multi-stage intermediate pre-training with billions of weakly-supervised text pairs, followed by fine-tuning with a few labeled datasets, our method does not require building complex training pipelines or relying on manually collected datasets… We leverage proprietary LLMs to generate diverse synthetic data for hundreds of thousands of text embedding tasks across nearly 100 languages.

It’s true that ultimately there’s a judgement call (what does “distance” mean?), but I think the original post far overcomplicates what’s standard practice today.

yunyu 314 daysReload

The above is a common anthropocentric take that has been repeatedly disproven by the last decade of deep learning research: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

Understanding audio with inputs in the frequency domain isn’t required for understanding frequencies in audio.

A large enough system with sufficient training data would definitely be able to come up with a Fourier transform (or something resembling one), if encoding it helped the loss go down.

Today’s diffusion models learn representations from raw pixels, without even the concept of convolutions.

For further reading, I’d recommend literature on transformer circuits for learning arithmetic without axioms: https://www.lesswrong.com/posts/CJsxd8ofLjGFxkmAP/explaining...

yunyu 314 daysReload

It’s true that ultimately there’s a judgement call (what does “distance” mean?), but I think the original post far overcomplicates what’s standard practice today.