Comment by lyu07282

14 days ago

Anyone know how the image encoding works exactly?

    <|image_start|><|patch|>...<|patch|><|tile_x_separator|><|patch|>...<|patch|><|tile_y_separator|><|patch|>...<|patch|><|image|><|patch|>...<|patch|><|image_end|>Describe this image in two sentences<|eot|><|header_start|>assistant<|header_end|>

Is "..." here raw 4 bytes RGBA as an integer or how does this work with the tokenizer?