Comment by akie
2 hours ago
Did you not see the chatbot they posted online (https://chatjimmy.ai/)? That thing is near instantaneous, it's all the proof you need that this is real.
And if the hardware is real and functional, as you can independently verify by chatting with that thing, how much more effort would it be to etch more recent models?
The real question is of course: what about LARGER models? I'm assuming you can apply some of the existing LLM inference parallelization techniques and split the workload over multiple cards. Some of the 32B models are plenty powerful.
It's a proof of concept, and a convincing one.
No comments yet
Contribute on Hacker News ↗