Comment by imjonse
6 months ago
Wonder why no Llama-3.1-8B based variant if the new training method has such good results. UPDATE: didn't work well https://x.com/mattshumer_/status/1831775436420083753?t=flm41...
6 months ago
Wonder why no Llama-3.1-8B based variant if the new training method has such good results. UPDATE: didn't work well https://x.com/mattshumer_/status/1831775436420083753?t=flm41...
He said it didn't improve as much
https://x.com/mattshumer_/status/1831775436420083753
It's answered on Twitter. Not much improvement over other similar models at that size.
Imagine if it was the reason in big corporations to not to investigate further some similar technique :)