Comment by shekhar101
4 days ago
Can someone (or OP) point me to a recipe to fine tune a model like this for natural language tasks like complicated NER or similar workflows? I tried finetuning Gemma3 270M when it came out last week without any success. A lot of tutorials are geared towards chat applications and role playing but I feel this model could be great for usecases like mine where I am trying to extract clean up and extract data from PDFs with entity identification and such.
If you're really just doing traditional NER (identifying non-overlapping spans of tokens which refer to named entities) then you're probably better off using encoder-only (e.g. https://huggingface.co/dslim/bert-large-NER) or encoder-decoder (e.g. https://huggingface.co/dbmdz/t5-base-conll03-english) models. These models aren't making headlines anymore because they're not decoder-only, but for established NLP tasks like this which don't involve generation, I think there's still a place for them, and I'd assume that at equal parameter counts they quite significantly outperform decoder-only models at NER, depending on the nature of the dataset.
This is using the gemma-llm python library which uses JAX in the background: https://gemma-llm.readthedocs.io/en/latest/colab_finetuning....
Have you tried this one here by any chance?
https://huggingface.co/dslim/bert-base-NER
Just wondering if it’s worth testing and what it would be most useful for.