Comment by verdverm
4 days ago
This sounds like MoE and maybe a bit of chain-of-thought. Curious what someone with more domain expertise thinks about this
If they can test against Llama 70B and Mistral 7B, they ought to compare against Mistral 8x7b imho
I'm not an expert, but MoE models perform better at continuous learning, because they are less prone to catastrophic forgetting.