Comment by 0x1ceb00da 1 year ago When I asked advanced voice mode it said that it receives input as audio and generates text as output. 1 comment 0x1ceb00da Reply mbrock 1 year ago It is mistaken because it has no particular insight into its own implementation. In fact the whole point is that it directly consumes and produces audio tokens with no text. That's why it's able to sing, make noises, do accents, and so on.
mbrock 1 year ago It is mistaken because it has no particular insight into its own implementation. In fact the whole point is that it directly consumes and produces audio tokens with no text. That's why it's able to sing, make noises, do accents, and so on.
It is mistaken because it has no particular insight into its own implementation. In fact the whole point is that it directly consumes and produces audio tokens with no text. That's why it's able to sing, make noises, do accents, and so on.