Comment by zacmps
17 days ago
I've found the highest accuracy solution is to OCR with one of the dedicated models then feed that text and the original image into an LLM with a prompt like:
"Correct errors in this OCR transcription".
17 days ago
I've found the highest accuracy solution is to OCR with one of the dedicated models then feed that text and the original image into an LLM with a prompt like:
"Correct errors in this OCR transcription".
How does it behave if the body of text is offensive or what if it is talking about a recipe to purify UF-6 gas at home? Will it stop doing what it is doing and enter lecturing mode?
I am asking not to be cynical but because of my limited experience with using LLMs for any task that may operate on offensive or unknown input seems to get triggered by all sorts of unpredictable moral judgements and dragged into generating not the output I wanted, at all.
If I am asking this black box to give me a JSON output containing keywords for a certain text, if it happens to be offensive, it refuses to do that.
How does one tackle with that?
We use the Azure models and there isn't an issue with safety filters as such for enterprise customers. The one time we had an issue microsoft changed the safety measures. Of course the safety measures we might meet are the sort of engineering which could be interpreted as weapons manufacturing, and not "political" as such. Basically the safety guard rails seem to be added on top of all these models, which means they can also be removed without impacting the model. I could be wrong on that, but it seems that way.
There are many settings for changing the safety level in Gemini API calls: https://ai.google.dev/gemini-api/docs/safety-settings
This is for anyone coming across this link later. In their latest SDKs, if you want to completely switch off their safety settings, the flag to use is 'OFF' and not 'BLOCK_NONE' as mentioned in the docs in the link above.
The Gemini docs don't refect that change yet. https://discuss.ai.google.dev/t/safety-settings-2025-update-...
Try setting the safety params to none and see if that makes any difference.
It's not something I've needed to deal with personally.
We have run into added content filters in Azure OpenAI on a different application, but we just put in a request to tune them down for us.
This is what we do today. Have you tried it against Gemini 2.0?