Alexa+: Is Command & Conversation

How Amazon Will Orchestrate Expert Execution Across Alexa's 600M Ecosystem

Mar 24, 2025

Rohit Prasad called Alexa powered by LLMs "A super agent integrated with thousands and thousands of devices and services." That was well over a year ago. After extensive development, Amazon officially unveiled Alexa+ on February 26, 2025. With purported to roll out slowly the last week of March. With Alexa+, Amazon aims to transform Alexa from a simple command and control device into a more capable and intelligent system. The+ version leverages generative AI to offer more natural conversations, contextual awareness, and the ability to perform complex, multi-step tasks across various services and devices.

Will Amazon’s ambition payoff?

The gap between ambition and reality forms the central tension in every generative AI strategy. For Alexa+ to succeed, Amazon will need to reliably maintain high accuracy in general knowledge, instruction-following abilities, memory, intent understanding, and a conversational style that is more than walls of text spoken aloud —benchmarks that have historically eluded GenAI voice assistants, including ChatGPT at scale. Alexa is among the largest device ecosystems in the world and needs to maintain functionality across the use cases for which it is currently deployed. This need is likely why the time between the Alexa+ idea and deployment has expanded to this day. This is also why Apple is delaying its update of Siri. Google has committed to replacing Google Assistant on mobile devices, though for non-mobile devices, the rollout is far from guaranteed.

The Evolution of Alexa+

Amazon's journey to create Alexa+ involved several technical breakthroughs and architectural shifts. Here are some key aspects of the new system:

Advanced Language Models: Alexa+ uses a combination of Amazon's Nova models and Anthropic's Claude, accessed through Amazon Bedrock, to power its generative capabilities.
Contextual Awareness: The system maintains conversation history without repetitive wake words, allowing for more natural interactions.
Multi-step Task Automation: Alexa+ can navigate services from Uber to Ticketmaster, orchestrating complex tasks across thousands of services simultaneously.
Visual Intelligence: The assistant can process documents and interpret environments, enhancing its ability to understand and respond to user needs.
Natural Language Understanding: Alexa+ handles colloquialisms and incomplete requests more effectively, making interactions feel more human-like.
API Orchestration: A new "coordination layer" transforms natural language into structured API calls across various services.
Personalization: Alexa+ learns user preferences and can apply this knowledge to provide more tailored assistance.
The Anatomy of Alexa+

Alexa+ isn't merely a rebranded version of the familiar voice assistant. At its core lies a fundamental architectural shift—from the traditional command-response system to a generative engine powered by Amazon's Nova models and Anthropic's Claude. This hybrid approach aims to bestow capabilities far beyond basic queries and smart home controls:

Contextual awareness that maintains conversation history without repetitive wake words
Multi-step task automation that navigates services from Uber to Ticketmaster
Visual intelligence that processes documents and interprets environments
Natural language understanding that handles colloquialisms and incomplete requests

These features operate through a newly constructed API orchestration system. Unlike previous iterations where each service required custom integration, Alexa+ creates what Amazon describes as a "coordination layer" that transforms natural language into structured API calls across thousands of services simultaneously.

We rearchitected the skills category kits to incorporate generative AI-powered experiences while maintaining the developer interfaces and implementations. For customers, this means many of the services they know and love, like their favorite music provider or their smart home devices, are available and enhanced on Alexa+. Customers can now speak more naturally, saying things like, “Alexa, who won best new artist last month?” to find and play their music, or say “Alexa, it's dark in here,” to brighten a room. - Amazon Developer blog

The Device Intelligence Nexus

Panos Panay, in a recent demonstration, showcased Alexa+'s improved device control capabilities. In a seemingly casual interaction, he requested music through carious methods, with Alexa responding naturally while controlling multiple smart home devices. What appears simple actually demonstrates complex orchestration of context, user preferences, and device states—far beyond the rigid command structures of earlier voice assistants.

These demonstrations reflect Amazon's recognition that device intelligence represents a critical battleground. With the proliferation of smart home technology, users increasingly expect ambient intelligence that anticipates needs rather than merely responding to explicit commands.

How It Works

Amazon has not released official architecture at this time, but based on several podcasts descriptions and available information BKTBB has pieced together some of it’s architecture.

Expert Handoff:
- Grounding Expert (e.g., local information)
- Conversational Expert
- Task-specific Experts (e.g., smart home control, reservations, music playback)
- API/Service Integration Experts
Action Execution:
- Make API/MCP calls to selected services
- Trigger device controls if necessary
Response Generation:
- Compile results from various experts
- Use LLM to create a natural language response
Multi-modal Output:
- Generate spoken response
- Display visual information on Echo Show devices if available
Follow-up:
- Maintain context for potential follow-up questions
- Prepare for additional actions (e.g., sending invitations, setting reminders)

This multi-model, expert-based approach allows Alexa+ to handle complex, multi-step tasks while maintaining accuracy and speed. The system is designed to be flexible and non-deterministic, adapting to the specific needs of each user interaction.

Performance and Response Time

A crucial aspect of Alexa+'s success will be its performance in real-world scenarios. In the demo Panay was abled to use directional cues to move the music from one side of the room to the other as well as call out devices by name. The devices responded in real time and performed flawlessly. He talks about this moment in depth around the 16 minute segment of this interview with Alex Kantrowitz .

While we do not have benchmarks for Alexa+ we do have an understanding of it’s underlying models which range in size from Claude 3.7 to Nova Micro. Panay let slip that a break through came about 4 months ago, whichfits in with both the timeline the Nova models came into being, and the internal testing of Claude 3.7 reasoning.

Here we see a breakdown of the smallest nova model, suggesting that it will be fast enough and smart enough to power many Alexa+ experts, and possibly even the orchestrator model with the ability to handoff to Claude for cases where more intelligence is needed.

Model comparison of Nova Micro from Artificial Analysis. — Model comparison of Nova Micro by **Artificial Analysis.**

Summary

Output Speed (tokens/s): Nova Micro has a median output speed of 333 tokens per second on
Amazon.Latency (TTFT): Nova Micro has a latency of 0.31 seconds on
Amazon.Blended Price ($/M tokens): Nova Micro has a price of $0.06 per 1M tokens on
Amazon (blended 3:1) with an Input Token Price: $0.04 and an Output Token Price: $0.14.Input Token Price: Nova Micro has an input token price of $0.04 on
Amazon.Output Token Price: Nova Micro has an output token price of $0.14 on Amazon.

The Market Reality: Dominance Meets Competition

Amazon approaches this transition from a position of strength. Amazon reports that there are over 600 million Alexa devices. This likely includes all devices branded to contain Alexa, and is an updated number from the 300/125M number reported in 2023.

However this dominance faces challenges from multiple directions:

Market Dynamics: While Alexa leads with a in virtual personal assistants, the overall smart speaker market growth has been slowing.
Innovation: The advent of generative AI assistants like ChatGPT's Advanced Voice Mode, Perplexity and others introduces advanced conversational capabilities, reshaping user expectations.
Cross-Platform Competition: Alexa's dominance in smart speakers contrasts with the ubiquity of smartphone-based AI assistants, presenting a challenge in user accessibility.

Amazon is addressing this by launching a new mobile app and browser-based experience for Alexa+. The device competition is also going to be firece Meta is making a push for more smart glasses, and the new Llama4 model will feature Voice and will likely be optimized for the smart glasses. OpenAI for it’s part just updated it’s voice models, and continues to dominate the subscriber base for LLM chatbots.

What’s Old is New

We've been here before. Siri debuted in 2011, followed by Alexa in 2014, each promised conversational intelligence that would transform our relationship with technology. The reality proved more mundane—systems that could handle basic commands but stumbled with complexity, context, or natural conversation.

This history makes many industry observers skeptical. Former Alexa team members candidly acknowledge that even with generative AI, the system "hallucinates, gets things wrong," making it "hard to build an experience when you're connecting to many different devices." In an interview with Mark Gurman, Panay expressed that he wants “perfection in every single product that we ship.”

The Future of Voice Intelligence

Amazon's $19.99 monthly subscription (free for Prime members) signals confidence in Alexa+'s value proposition. Yet questions remain about whether these technological advances will translate into meaningful user experiences.

The challenges aren't merely technical. They involve complex human factors—how users formulate requests, their expectations for response quality, and their tolerance for occasional failures. As we've seen with previous voice assistants, technical capability alone doesn't guarantee adoption. The system must create genuine utility while maintaining consistency and reliability.

The integration of Experts through context management represents perhaps the most promising aspect of Alexa+. By maintaining coherent understanding across interactions, Alexa+ could potentially overcome the fragmented experience that characterized earlier voice assistants. Yet this promise remains largely theoretical until proven in widespread real-world usage.

As we watch Alexa+ roll out to the larger Echo Show devices in the coming weeks, with broader availability to follow, we're witnessing more than just another product launch. We're observing Amazon's effort to redefine voice assistants for the generative AI era—navigating between ambitious promises and pragmatic realities, between the limitations of today's technology and the potential of tomorrow's.

Breaking the Black Box

The Anatomy of Alexa+

Discussion about this post

Ready for more?