Comment by gardnr

3 years ago

There are a couple open source projects that expose llama.cpp and gpt4j models via a compatible OpenAI API. This is one of them: https://github.com/lhenault/simpleAI

2 comments

gardnr

sodality2 3 years ago

Nowadays falcon-40b is probably more accurate than gpt4j, here's to hoping we get llama.cpp support for falcon builds soon [0]!

[0]: https://github.com/ggerganov/llama.cpp/issues/1602

mikeravkine 3 years ago

The GGLLM fork seems to be the leading falcon winner for now [1]
It comes with its own variant of the GGML sub format "ggcv1" but there's quants available on HF [2]
Although if you have a GPU I'd go with the newly released AWQ quantization instead [3] the performance is better.
(I may or may not have a mild local LLM addiction - and video cards cost more then drugs)
[1] https://github.com/cmp-nct/ggllm.cpp
[2] https://huggingface.co/TheBloke/falcon-7b-instruct-GGML
[3] https://huggingface.co/abhinavkulkarni/tiiuae-falcon-7b-inst...