Comment by spark_chicken

1 year ago

i have tried it. it is really fast! I know making a real-time voice bot is not easy with this low latency. which LLM did you use? how large LLM to make the conversation efficient?

1 comment

spark_chicken

makeitmore 1 year ago

This particular demo is using Llama3 8B. We initially started 70B, but it was a touch slower and needed much more VRAM. We found 8B good enough for general chit-chat like in this demo. Most real-world use-cases will likely have their own fine-tuned models.