← Back to context

Comment by verdverm

4 days ago

This sounds like MoE and maybe a bit of chain-of-thought. Curious what someone with more domain expertise thinks about this

If they can test against Llama 70B and Mistral 7B, they ought to compare against Mistral 8x7b imho

I'm not an expert, but MoE models perform better at continuous learning, because they are less prone to catastrophic forgetting.