Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.
The CUDA moat is widely unappreciated and misunderstood. Dethroning Nvidia demands more than SOTA hardware.
OpenAI, Meta, Google, AWS, AMD, and others have long failed to eliminate the Nvidia tax.
Without diving into the gory details, the simple proof is that billions were spent on inference last year by some of the most sophisticated technology companies in the world.
They had the talent and the incentive to migrate, but didn't.
In particular, OpenAI spent $4 billion, 33% more than on training, yet still ran on NVDA. Google owns leading chips and leading models, and could offer the tech talent to facilitate migrations, yet still cannot cross the CUDA moat and convince many inference customers to switch.
People are desperate to quit their NVDA-tine addiction, but they can't for now.
[Edited to include Google, even though Google owns the chips and the models; h/t @onlyrealcuzzo]
For instance, we experimented with AWS Inferentia briefly, but the value prop wasn't sufficient even for ~2022 computer vision models.
The calculus is even worse for SOTA LLMs.
The more you need to eke out performance gains and ship quickly, the more you depend on CUDA and the deeper the moat becomes.
Google has leading models operating on leading hardware, backed by sophisticated tech talent who could facilitate migrations, yet Google still cannot leap over the CUDA moat and capture meaningful inference market share.
Yes, training plays a crucial role. This is where companies get shoehorned into the CUDA ecosystem, but if CUDA were not so intertwined with performance and reliability, customers could theoretically switch after training.
With mature models and minimal CUDA dependencies, migration can be justified, but this does not describe most of the LLM inference market today nor in the past.