> In summary, we used a 2.4-million-parameter small language model to achieve accuracy comparable to a 100-million-parameter BERT model in text classification.
Neat, but the question will be how the scaling laws hold up
The Wave Network is an innovative ultra-small language model that employs a unique token representation and update method. It significantly reduces video memory usage and training time compared to models like BERT base, achieving reductions of 77.34% and 85.62% during wave modulation.