OpenAI updates ChatGPT to let AI tool 'see, hear and speak'

September 25, 2023 at 19:48 PM EDT

OpenAI is updating ChatGPT to allow the artificial intelligence-powered chatbot to hear and respond to voice prompts, generate speech from text and analyze images for users.

OpenAI is updating the capabilities of ChatGPT to allow the artificial intelligence (AI) tool to "see, hear, and speak" in the latest upgrades to the viral chatbot.

OpenAI is rolling out updates that will allow ChatGPT to understand verbal prompts and respond in a back-and-forth conversation with the user using the chatbot’s new voice. The chatbot will also be able to respond to image prompts. The changes give ChatGPT capabilities more along the lines of those supported by Siri; Google Lens and voice assistant; and Amazon’s Alexa.

"Voice and image give you more ways to use ChatGPT in your life," OpenAI said in the announcement. "Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it. When you’re home, snap pictures of your fridge and pantry to figure out what’s for dinner (and ask follow-up questions for a step-by-step recipe). After dinner, help your child with a math problem by taking a photo, circling the problem set, and having it share hints with you."

WHAT IS CHATGPT?

ChatGPT’s new voice capability is powered by a text-to-speech model capable of generating human-like audio from text and a few seconds of sample speech.

The company also used professional voice actors to create its voices and utilizes OpenAI’s open-source speech recognition system called Whisper to transcribe spoken words into text.

WHAT IS ARTIFICIAL INTELLIGENCE (AI)?

The company noted that there are some risks posed by the new voice technology, such as the potential for fraud or impersonation to occur.

"The new voice technology — capable of crafting realistic synthetic voices from just a few seconds of real speech — opens doors to many creative and accessibility-focused applications," OpenAI said in the announcement. "However, these new capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud."

AI VOICE CLONING SCAMS ON THE RISE, EXPERT WARNS

It added that vision-based models also present new challenges and that the company has "taken technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people since ChatGPT is not always accurate and these systems should respect individuals’ privacy."

OpenAI went on to note, "Vision-based models also present new challenges, ranging from hallucinations about people to relying on the model’s interpretation of images in high-stakes domains."

The company said it tested the model with "red teamers for risk in domains such as extremism and scientific proficiency, and a diverse set of alpha testers."

GET FOX BUSINESS ON THE GO BY CLICKING HERE

OpenAI added that it will add voice and image capabilities to users of the Plus and Enterprise versions of ChatGPT in the next two weeks.

Reuters contributed to this report.