Genuinely compact models are hitting benchmarks just a couple months behind the big boys. And eventually we’ll get better at decoupling data from processing - so the model can do a regular-ass search of a regular-ass database and pull details into its context as needed. Ideally while also decoupling that context from the prompt, because apparently these things can have a hundred attention heads, and still nobody thought of having two text input fields.
All focus has been forced onto LLMs and diffusion, even though only diffusion works properly. And those LLMs better iterate on the exact same mechanisms we’ve tweaked for the last six years, because all results will be compared to the state of the art, right the hell now, not a comparable level of development or compute.
Genuinely compact models are hitting benchmarks just a couple months behind the big boys. And eventually we’ll get better at decoupling data from processing - so the model can do a regular-ass search of a regular-ass database and pull details into its context as needed. Ideally while also decoupling that context from the prompt, because apparently these things can have a hundred attention heads, and still nobody thought of having two text input fields.
You’d think they’d have done that by now, and maybe some symbolic AI too
All focus has been forced onto LLMs and diffusion, even though only diffusion works properly. And those LLMs better iterate on the exact same mechanisms we’ve tweaked for the last six years, because all results will be compared to the state of the art, right the hell now, not a comparable level of development or compute.