← Back to context

Comment by imjonse

6 months ago

Wonder why no Llama-3.1-8B based variant if the new training method has such good results. UPDATE: didn't work well https://x.com/mattshumer_/status/1831775436420083753?t=flm41...

Imagine if it was the reason in big corporations to not to investigate further some similar technique :)