← Back to context

Comment by xrd

1 day ago

This is why I asked this question yesterday:

"Ask HN: Why don't programming language foundations offer "smol" models?"

https://news.ycombinator.com/item?id=45840078

If I could run smol single language models myself, I would not have to worry.

> I wonder why I can't find a model that only does Python and is good only at that

I don't think it's that easy. The times I've trained my own tiny models on just one language (programming or otherwise), they tend to get worse results than the models I've trained where I've chucked in all the languages I had at hand, even when testing just for single languages.

It seems somewhat intuitive to me that it works like that too, programming in different (mainstream) languages is more similar than it's different (especially when 90% of all the source code is Algol-like), so makes sense there is a lot of cross-learning across languages.

The answer to most convenient solutions is money. There's no money in that.

  • And or, the lower parameter models are straight up less effective than the giants? Why is anyone paying for sonnet and opus if mixtral could do what they do?

  • But, for example, Zig as a language has prominent corporate support. And, Mitchell Hashimoto is incredibly active and a billionaire. It feels like this would be a rational way to expand the usage of a language.

because a smol model that any of the nonprofits could feasibly afford to train would be useless for actual code generation.

Hell, even the huge foundational models are still useless in most scenarios.

Have you even tried Qwen3-Coder-30B-A3B?

  • Qwen3 Coder 30B A3B is shockingly capable for its parameter count, but I wouldn't overlook how much weight the words "for its parameter count" are carrying here.

  • I haven't. I will.

    I wonder if you could ablate everything except for a specific language.