Comment by RoyTyrell

5 days ago

Let me preface by saying I'm not skeptical about your answer or think you're full of crap. Can you give me an example or two about a single task that you fine-tune for? Just trying to familiarize myself with more AI engineering tasks.

Yep!

So my use case currently is admittedly very specific. My company uses LLMs to automate hardware design, which is a skill that most LLMs are very poor at due to the dearth of training data.

For tasks which involve generation of code or other non-natural language output, we’ve found that fine-tuning with the right dataset can lift performance rapidly and decisively.

An example task is taking in potentially syntactically incorrect HDL (Hardware Description Language) code and fixing the syntax issues. Fine-tuning boosted corrective performance significantly.

I used fine-tuning back in the day because GPT 3.5 struggled with the concept of determining if two sentences were equivalent or not. This was for grading language learning drills. It was a single skill for a specific task and I had lots of example data from thousands of spaced repetition quiz sessions. The base model struggled with the vague concept of “close enough” equivalence. Since that time, the state of the art has advanced to the point that I don’t need it anymore. I could probably do it to save some money but I’m pretty happy with GPT 4.1.

Any classification task. For example in search ranking, does a document contain the answer to this question?