Amazon has introduced a new voice AI model, Nova Sonic, designed to revolutionize real-time speech interactions by sensing and adapting to human emotions. The model represents a major leap in the tech giant’s pursuit of more human-like artificial intelligence, entering the competitive landscape alongside OpenAI’s GPT-4o and Google’s Gemini.
Developed by Amazon’s Artificial General Intelligence (AGI) team, Nova Sonic integrates speech recognition, language understanding, and speech generation into one unified system. This allows it to maintain conversational flow, detect tone and emotion, and respond with natural, context-aware dialogue. For instance, an excited user might hear an upbeat reply, while someone expressing frustration would receive a calm and composed response.

Unlike previous systems that relied on separate components for voice interaction, Nova Sonic’s all-in-one architecture allows for smoother, more dynamic exchanges. It retains conversational elements such as intonation, pace, and emotional nuance, enabling it to act during dialogue—like checking schedules or retrieving real-time data—without disrupting the conversation.
Nova Sonic is already being integrated into Amazon products, including the updated Alexa+ voice assistant, and will be made available to developers via Amazon Bedrock, the company’s platform for foundation model access. A new streaming API supports real-time applications, currently in English with multiple voices and accents, and with support for more languages underway.
In benchmark tests, Amazon claims Nova Sonic delivers superior performance, responding in just over one second—faster than OpenAI’s GPT-4o and Google’s Gemini Flash 2.0—while offering cost savings of nearly 80% compared to GPT-4o in real-time use.
The model is currently being piloted by companies like ASAPP for customer service, Education First for language learning, and Stats Perform for sports insights. Designed for seamless integration with business systems, Nova Sonic can access and utilize real-time data, enabling functions such as reservations, account checks, or dynamic suggestions based on user input.
Nova Sonic joins Amazon’s Nova suite of AI models, introduced at AWS re:Invent, which includes capabilities in text, image, and video generation. It follows the recent debut of Nova Act, Amazon’s agent for automating web-based tasks.
According to Rohit Prasad, Amazon’s SVP of AGI and former Alexa chief scientist, Nova Sonic marks a significant step toward the company’s long-term goal: developing AI that understands and responds across modalities in the most natural, human-like manner.
“This is where machine and human intelligence begin to merge,” said Prasad. “Nova Sonic is a pivotal advancement toward realizing true artificial general intelligence.”