Comment by WhitneyLand
5 hours ago
Yeah and it’s pretty memory efficient with only 8 attention layers so at int8 in 16GB ram maybe you still get 64k-128k context.
The part I hate though is that I’d bet none of the performance claims are based on int8.
Why do we care about bf16 benchmarks when no one will be using that with this model.
No comments yet
Contribute on Hacker News ↗