Comment by fngjdflmdflg
17 days ago
Just tried this and it did not appear to work for me. Prompt:
>Please provide me strict bounding boxes that encompasses the following text in the attached image? I'm trying to draw a rectangle around the text.
> - Use the top-left coordinate system
>this input document is 1080 x 1236 px. return the bounding boxes as integers
https://github.com/google-gemini/cookbook/blob/a916686f95f43...
They say there's no magic prompt but I'd start with their default since there is usually some format used to improve performance with posttraining with tasks like this
"Might" being the operative word, particularly with models that have less prompt adherence. There's a few other prompt massaging tricks beyond the scope of a HN comment, the decimal issue is just one optimization.