I wonder if we should make the UI a more common interface (e.g. "the call is ringing") to avoid this confusion?
It's a normal mp4 video that's looping initially (the "welcome message") and then as soon as you send the bot a message, we connect you to a GPU and the call becomes interactive. Connecting to the GPU takes about 10s.
Makes sense. The init should be about 10s. But, after that, it should be real time. TBH, this is probably a common confusion. So thanks for calling it out.
I wonder if we should make the UI a more common interface (e.g. "the call is ringing") to avoid this confusion?
It's a normal mp4 video that's looping initially (the "welcome message") and then as soon as you send the bot a message, we connect you to a GPU and the call becomes interactive. Connecting to the GPU takes about 10s.
Makes sense. The init should be about 10s. But, after that, it should be real time. TBH, this is probably a common confusion. So thanks for calling it out.