OpenAI announces ChatGPT will soon ‘see, hear, and speak’
The new features, which include speech recognition and text-to-speech capabilities, will roll out over the next two weeks.
ChatGPT will soon offer new features that allow users to engage with it through images and voice recognition, according to an announcement from OpenAI on Sept. 25.
OpenAI announced that users will be able to interact with ChatGPT using voice commands, enabling a more personalized user experience. The company said that this feature is powered by a text-to-speech model that can generate audio from minimal sample speech created by professional voice actors. It said that the feature is also powered by its open-source speech recognition system, Whisper.
The voice features are expected to provide a wider range of use cases, such as assisting in tasks like reading bedtime stories, creating recipes, composing speeches, reciting poems, explaining common phrases, or even resolving “dinner table debates.”
OpenAI added that users will soon be able to provide images to ChatGPT (or select certain parts of images) for interpretation and response.
OpenAI acknowledges risks
OpenAI acknowledged the risk of fraud and impersonation and said that, accordingly, it is limiting voice features to its voice chat platform. It emphasized that it uses professional voice actors — not user voices — for output audio. OpenAI added that certain other groups are permitted to use voice capabilities for other purposes; Spotify, for example, is translating participating podcasts to new languages in each host’s original voice.
The company noted that image recognition carries privacy risks and said that, in response, it has limited ChatGPT’s ability to make statements about people. It noted that ChatGPT “is not always accurate” but said that general descriptions of images can be useful, citing its earlier work with Be My Eyes, an app for blind and low-vision people.
OpenAI said that it will introduce voice and image features to ChatGPT Plus and Enterprise over the next two weeks. It said that voice features will be available on iOS and Android on an opt-in basis, and that image features will be available on all platforms.