Comment by refulgentis
14 days ago
"That’s dynamically decided during training and not set before, right?"
^ right. I can't recall off the top of my head, but there was a recent paper that showed if you tried dictating this sort of thing the perf fell off a cliff (I presume there's some layer of base knowledge $X that each expert needs)
No comments yet
Contribute on Hacker News ↗