Comment by empath75
16 hours ago
I am not sure we are at the "efficiency" phase of this.
Even if you just wire this output (or probably multiples running different counterfactuals) into a multimodal LLM that interprets the video and uses it to make decisions, you have something new.
No comments yet
Contribute on Hacker News ↗