← Back to context

Comment by sipjca

13 days ago

LocalScore dev here

Llamafile could certainly be released without the GPU binaries included by default and it would slim down the size tremendously.

The extra 70MiB is that the CUDA binaries for LocalScore are built with CuBLAS and for more generations of NVIDIA architectures (sm60->sm120), whereas Llamafile is built with TinyBLAS and for just a few generations in particular

I think it's possible to randomize weights with a standard set of layers, and maybe a possibility for the future