• dimjim@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    36
    ·
    2 days ago

    Yet, even when heavily compressed, it requires roughly 240GB of memory just to load.

    Ah I’ll just pop it in the ol’ Raspberry Pi then, easy peasy.

    • Nouvellalia@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      3
      ·
      2 days ago

      Lol, “runs locally”. I mean, Claude rubs locally too if you’re in the room with the racks.

      Edit: I said what I said. Get some lube and go hang out with Claude’s hot, noisy, 5kw rack. You know what they say “Once you go stack you never go back.”

    • Fluffy Kitty Cat@slrpnk.net
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 days ago

      Basically they never had any moat to begin with but no one else seems to know how to fit that much intelligence into less space. It’s possible that it just fundamentally has to take up that much space which would also imply that future Computing gainss are going to be more focused on memory than raw competition

      • mindbleach@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        4
        ·
        2 days ago

        Genuinely compact models are hitting benchmarks just a couple months behind the big boys. And eventually we’ll get better at decoupling data from processing - so the model can do a regular-ass search of a regular-ass database and pull details into its context as needed. Ideally while also decoupling that context from the prompt, because apparently these things can have a hundred attention heads, and still nobody thought of having two text input fields.

          • mindbleach@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            2
            ·
            2 days ago

            All focus has been forced onto LLMs and diffusion, even though only diffusion works properly. And those LLMs better iterate on the exact same mechanisms we’ve tweaked for the last six years, because all results will be compared to the state of the art, right the hell now, not a comparable level of development or compute.