← Back to context

Comment by imjonse

1 year ago

Wonder why no Llama-3.1-8B based variant if the new training method has such good results. UPDATE: didn't work well https://x.com/mattshumer_/status/1831775436420083753?t=flm41...

It's answered on Twitter. Not much improvement over other similar models at that size.

Imagine if it was the reason in big corporations to not to investigate further some similar technique :)