← Back to context

Comment by spark_chicken

1 year ago

i have tried it. it is really fast! I know making a real-time voice bot is not easy with this low latency. which LLM did you use? how large LLM to make the conversation efficient?

This particular demo is using Llama3 8B. We initially started 70B, but it was a touch slower and needed much more VRAM. We found 8B good enough for general chit-chat like in this demo. Most real-world use-cases will likely have their own fine-tuned models.