OpenAI’s ChatGPT Now Talks, Listens, Sees Images

Announced on September 25th, the new multimodal capabilities allow ChatGPT users to have natural conversations with the AI agent, which can now understand spoken questions and respond through one of five different voices.

According to OpenAI’s blog post, this more human-like interface opens up novel ways to interact with ChatGPT. For example, users could snap a photo of a landmark while traveling and discuss it with ChatGPT, or take pictures of groceries to get step-by-step cooking instructions.

The speech and vision upgrades will first roll out to Plus and Enterprise customers on mobile over the next two weeks, with expanded access for developers and other users planned soon after.

ChatGPT’s makeover comes right after OpenAI launched DALL-E 3, its most advanced AI image generator to date. DALL-E 3 leverages natural language processing to help users refine images and integrate ChatGPT for image prompt creation.

In related AI developments, OpenAI rival Anthropic recently announced a partnership with Amazon investing up to $4 billion for cloud services and hardware access. In return, Anthropic will provide enhanced support for Amazon’s foundational AI model Bedrock.

As AI conversational agents become more multi-functional and human-like, experts predict mass adoption could be right around the corner. The rapid pace of innovation continues to heat up the generative AI space.

#ArtificialIntelligence #ChatGPT

Leave a Reply

Your email address will not be published. Required fields are marked *