Comment by almaight
9 days ago
"multi-modal feature extraction → semantic translation → cross-modal feature transfer → precise temporal alignment," is all we need
9 days ago
"multi-modal feature extraction → semantic translation → cross-modal feature transfer → precise temporal alignment," is all we need
No comments yet
Contribute on Hacker News ↗