The really crazy thing is that this model still performs well at one-bit quantization, which shows it’s got a lot of room for improvement on size. It’s within an order of magnitude of being able to be run on consumer hardware, which would be an even more amazing kick in the balls to American AI companies.
Given how memory is the bottleneck especially at the very low end it makes me wonder if one bit quantization of an extremely large model would be a gigabyte per gigabyte of ram better
The really crazy thing is that this model still performs well at one-bit quantization, which shows it’s got a lot of room for improvement on size. It’s within an order of magnitude of being able to be run on consumer hardware, which would be an even more amazing kick in the balls to American AI companies.
Sucks that people lump AI into a single category of whatever cloud-hosted subscription that tech bros from Silicon Valley are pushing.
Given how memory is the bottleneck especially at the very low end it makes me wonder if one bit quantization of an extremely large model would be a gigabyte per gigabyte of ram better