Comment by echoangle

1 year ago

Is it public (or even known by the developers) how the experts are split up? Is it by topic, so physics questions go to one and biology goes to another one? Or just by language, so every English question is handled by one expert? That’s dynamically decided during training and not set before, right?

3 comments

echoangle

ianbutler 1 year ago

This is a common misunderstanding. Experts are learned via gating networks during training that routes dynamically per parameter. You might have an expert on the word "apple" in one layer for a slightly lossy example.

Queries are then also dynamically routed.

refulgentis 1 year ago

"That’s dynamically decided during training and not set before, right?"

^ right. I can't recall off the top of my head, but there was a recent paper that showed if you tried dictating this sort of thing the perf fell off a cliff (I presume there's some layer of base knowledge $X that each expert needs)

sshh12 1 year ago

It can be either but typically it's "learned" without a defined mapping (which guessing is the case here). Although some experts may end up heavily correlating with certain domains.