Comment by makeitmore
1 year ago
This particular demo is using Llama3 8B. We initially started 70B, but it was a touch slower and needed much more VRAM. We found 8B good enough for general chit-chat like in this demo. Most real-world use-cases will likely have their own fine-tuned models.
No comments yet
Contribute on Hacker News ↗