Comment by johnb231
2 months ago
The latest models are natively multimodal. Audio, video, images, text, are all tokenised and interpreted in the same model.
2 months ago
The latest models are natively multimodal. Audio, video, images, text, are all tokenised and interpreted in the same model.
No comments yet
Contribute on Hacker News ↗