Comment by ssddanbrown

3 years ago

I maintain an open source documentation platform, for which I had received a few queries about AI tooling. I'm not into the AI world of development, and my tech stack & distribution approach aren't great to provide AI friendly tech in my project itself, but connecting to external applications that can consume/combine multiple sources seemed like a good potential approach.

I came across Danswer a few days ago as an option for this, so I spent a day building a connector [1]. I was pleasantly surprised how accurate the output was for something like this. I have a few pages detailing my servers and I could ask things like "Where is x server hosted"? and get a correct response accompanied with a link to the right source page.

Some things to be aware of specifically about Danswer: It only works with OpenAI right now, although the team said that open model support is important as a future focus. Additionally it felt fairly heavy to run and required a 30 minute docker build process but I think they've improved on this now with pre-built images, and I'm not familiar with the usual requirements/weight of this kind of tech. Otherwise, things were easy to start up and play around with, even for an AI noob like me. Both their web and text-upload source connectors worked without issue in my testing.

[1]: https://github.com/danswer-ai/danswer/pull/139

3 comments

ssddanbrown

gardnr 3 years ago

There are a couple open source projects that expose llama.cpp and gpt4j models via a compatible OpenAI API. This is one of them: https://github.com/lhenault/simpleAI

sodality2 3 years ago
Nowadays falcon-40b is probably more accurate than gpt4j, here's to hoping we get llama.cpp support for falcon builds soon [0]!
[0]: https://github.com/ggerganov/llama.cpp/issues/1602
- mikeravkine 3 years ago
  
  The GGLLM fork seems to be the leading falcon winner for now [1]
  It comes with its own variant of the GGML sub format "ggcv1" but there's quants available on HF [2]
  Although if you have a GPU I'd go with the newly released AWQ quantization instead [3] the performance is better.
  (I may or may not have a mild local LLM addiction - and video cards cost more then drugs)
  [1] https://github.com/cmp-nct/ggllm.cpp
  [2] https://huggingface.co/TheBloke/falcon-7b-instruct-GGML
  [3] https://huggingface.co/abhinavkulkarni/tiiuae-falcon-7b-inst...