Comment by imjonse

1 year ago

Wonder why no Llama-3.1-8B based variant if the new training method has such good results. UPDATE: didn't work well https://x.com/mattshumer_/status/1831775436420083753?t=flm41...

3 comments

imjonse

He said it didn't improve as much

It's answered on Twitter. Not much improvement over other similar models at that size.

Imagine if it was the reason in big corporations to not to investigate further some similar technique :)