Comment by sipjca
13 days ago
LocalScore dev here
Llamafile could certainly be released without the GPU binaries included by default and it would slim down the size tremendously.
The extra 70MiB is that the CUDA binaries for LocalScore are built with CuBLAS and for more generations of NVIDIA architectures (sm60->sm120), whereas Llamafile is built with TinyBLAS and for just a few generations in particular
I think it's possible to randomize weights with a standard set of layers, and maybe a possibility for the future
No comments yet
Contribute on Hacker News ↗