Comment by hogehoge51
16 hours ago
Whats the practical benefit of fine tune training on a local repo, vs putting the summary of local infomation in the context? i.e every team has their own style and preference for coding patterns that could be generalized - but i imagine a large scale model has seen fhem all so they could be described in the context, or are there specific domain level patterns that can be generalized that would never be seen outside an org so are difficult for a model to infer without fresh tunning?
I work on the biggest codebase in the world. We have a fine-tuned model on our codebase. I've not been impressed with it. It does not produce better code than the non-tuned model.
Maybe there's certain problems that it excels at but probably 99% of what I throw it at can be gleaned from the context/nearby code anyway, like you said. Even if I'm using some in-house library (pretty much all of our code), the models are good enough to dig into that library and read the headers if they need to.
Maybe it can help with speed? If it needs to do less research before it can start coding.
How many lines of code is there in the biggest codebase in the world?
Fine-tuning coder models is not nearly as effective as intelligently managing the context with frontier models (opus, gpt-5.2-codex).
I don't think it's even a question. A 32b model will not compete with SotA for years to come (if ever). The idea behind this release is to fine-tune on your codebase and compare to non-finetuned open models from the same class (or one higher). So if you need local processing, without access to SotA (security, compliance, whatever) then this is an interesting avenue for you. And the cost is fairly low. They are releasing the method to do this on your own codebase / docs / processes.
Is this how you say "I work at Google" without explicitly saying that?
Prove it's the biggest codebase in the world. No way do you know that for sure!
"Hey Claude, please scaffold me the biggest codebase in the world"