Comment by c_guru
4 hours ago
Thanks. For voice and video we use two-layered models (Whisper + fine-tuning), while for documents, images, and other content users can choose from multiple models. For example CLIP ViT-B/32, all-MiniLM-L6-v2, multilingual variants, etc. Everything runs fully offline.
No comments yet
Contribute on Hacker News ↗