Comment by jodrellblank

1 year ago

Try one and find out. Look at https://github.com/Mozilla-Ocho/llamafile/ Quickstart section; download a single cross-platform ~3.7GB file and execute it, it starts a local model, local webserver, and you can query it.

See it demonstrated in a <7 minute video here: https://www.youtube.com/watch?v=d1Fnfvat6nM

The video explains that you can download the larger models on that Github page and use them with other command line parameters, and shows how you can get a Windows + nVidia setup to GPU accelerate the model (install CUDA and MSVC / VS Community edition with C++ tools, run for the first time from MSVC x64 command prompt so it can build a thing using cuBLAS, rerun normally with "-ngl 35" command line parameter to use 3.5GB of GPU memory (my card doesn't have much)).

1 comment

jodrellblank

jodrellblank 1 year ago

GPU bits have changed! I just noticed in the video description:

"IMPORTANT: This video is obsolete as of December 26, 2023 GPU now works out of the box on Windows. You still need to pass the -ngl 35 flag, but you're no longer required to install CUDA/MSVC."

So that's convenient.