← Back to context Comment by cedws 8 hours ago Given how primitive that image is, what's the point of even having an image model at this size? 4 comments cedws Reply simonw 6 hours ago This isn't an image model. It's a text model, but text models can output SVG so you can challenge them to generate a challenging image and see how well they do. cedws 5 hours ago >Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs.But I understood your point, Simon asked it to output SVG (text) instead of a raster image so it's more difficult. simonw 5 hours ago It can handle image and audio inputs, but it cannot produce those as outputs - it's purely a text output model. 1 reply →
simonw 6 hours ago This isn't an image model. It's a text model, but text models can output SVG so you can challenge them to generate a challenging image and see how well they do. cedws 5 hours ago >Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs.But I understood your point, Simon asked it to output SVG (text) instead of a raster image so it's more difficult. simonw 5 hours ago It can handle image and audio inputs, but it cannot produce those as outputs - it's purely a text output model. 1 reply →
cedws 5 hours ago >Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs.But I understood your point, Simon asked it to output SVG (text) instead of a raster image so it's more difficult. simonw 5 hours ago It can handle image and audio inputs, but it cannot produce those as outputs - it's purely a text output model. 1 reply →
simonw 5 hours ago It can handle image and audio inputs, but it cannot produce those as outputs - it's purely a text output model. 1 reply →
This isn't an image model. It's a text model, but text models can output SVG so you can challenge them to generate a challenging image and see how well they do.
>Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs.
But I understood your point, Simon asked it to output SVG (text) instead of a raster image so it's more difficult.
It can handle image and audio inputs, but it cannot produce those as outputs - it's purely a text output model.
1 reply →