How SpeechTranslate Works

Four steps between a caller speaking Mandarin and an agent hearing English. Under 300 milliseconds. Fully bidirectional. No app to install. Now with Speech-to-Speech AI.

Step 01

Customer Speaks in Their Language

The caller speaks naturally in their native language. SpeechTranslate captures the audio stream in real time using WebRTC and processes it through advanced speech recognition powered by AWS Transcribe, Azure Speech, or Google Speech-to-Text — automatically selecting the best provider for each language.

Step 02

Language Auto-Detected

SpeechTranslate's multi-provider detection system identifies the caller's language within seconds — voting across AWS Transcribe, Google Cloud, and Azure Speech for high-confidence results. If the caller switches languages mid-call, the system detects and adapts automatically.

Step 03

AI Translates in Real Time

SpeechTranslate offers two translation pathways: Speech-to-Speech mode uses Amazon Nova Sonic V2 or Google Gemini Live to translate audio directly — no text intermediary, lower latency. Pipeline mode uses neural STT, machine translation, and TTS with multi-provider failover. Both modes support custom glossaries for domain-specific terminology (medical, legal, financial).

Step 04

Agent Hears Translated Speech

The translated text is converted to natural-sounding speech using text-to-speech (TTS) synthesis and played back to the agent in English — all within 200-300 milliseconds. The process works bidirectionally: the agent's English responses are simultaneously translated back into the caller's language.

AI Agent Assist — Your Agent's Real-Time Copilot

While SpeechTranslate handles the language barrier, Agent Assist gives your agents an AI copilot that listens to the live transcript and provides instant answers, suggested actions, and automated workflows — powered by Amazon Bedrock Agents and AgentCore with retrieval-augmented generation (RAG). Use Prompt Studio to define when the AI chatbot triggers based on customer utterances and what intents and entities to extract — no code changes required.

Live Transcript Classification

Automatically classifies caller intent from the live transcript — routing questions, complaints, and requests to the right action without the agent manually tagging anything.

Knowledge Base Retrieval (RAG)

Pulls answers from your company's documents, FAQs, and policies in real time. Supports S3 vector stores, DynamoDB, and Aurora Serverless as knowledge sources.

Action Groups & Tool Use

The agent can trigger real actions — look up an order, create a ticket in Zendesk, update a record in Salesforce — directly from the conversation context.

Multi-Turn Session Memory

Maintains full conversation context across turns. The AI assistant remembers what was discussed earlier in the call, not just the last message.

Prompt Studio — Intent Triggers & Entity Extraction

Define when the AI chatbot activates based on customer utterances — configure classification prompts, intents to detect, and entities to extract from the conversation. When a customer says something matching your triggers, Prompt Studio sends the right context to Bedrock Agents or AgentCore for an intelligent response. Supports draft/publish workflow.

Bring Your Own Knowledge

Agent Assist connects to your existing data — wherever it lives. Upload documents to S3, connect a DynamoDB table, or query Aurora Serverless directly. The Bedrock knowledge base indexes your content into a vector store and retrieves the most relevant answers in real time during every call. Choose your runtime: Bedrock Agents for fully managed orchestration, or AgentCore for advanced streaming traces and custom tool integrations via MCP gateway.

S3 Vector Store

PDFs, Word docs, CSVs, HTML, Markdown — drop files into S3 and they're automatically chunked, embedded, and searchable. Supports all major document formats.

DynamoDB

Connect live operational data — product catalogs, customer records, policy tables. Agent Assist queries your DynamoDB tables in real time during calls.

Aurora Serverless

For structured data at scale — connect Aurora Serverless PostgreSQL or MySQL as a knowledge source with full SQL query support via Bedrock action groups.

Integrates With Your Stack

Agent Assist uses Bedrock action groups to connect to any external system via REST APIs. Out-of-the-box support for major CRM and helpdesk platforms — plus any custom API.

Salesforce

CRM lookup, case creation, contact updates

Zendesk

Ticket creation, knowledge base search

Zoho

CRM, Desk, and custom module integration

ServiceNow

Incident management, CMDB queries

HubSpot

Contact records, deal pipeline, tickets

Custom APIs

Any REST API via Bedrock action groups

Cross-Call Intelligence

ConnectIQ v2 — See What Your Calls Are Really Telling You

Most contact centers review 2-5% of calls manually. ConnectIQ v2 scores 100% of them automatically, monitors translation quality in real time, tracks provider health, and detects cross-lingual patterns across your entire call volume — in all 66 supported languages. It extends Amazon Connect Contact Lens with automated QA, Translation Quality Index (TQI), pipeline health monitoring, cost quantification, and AI-powered root cause analysis.

100%

100% Automated QA Scoring

of calls scored

Every single call gets scored — not just the 2-5% that traditional QA teams can review. ConnectIQ evaluates agent performance, compliance, and customer sentiment across all calls automatically.

Cross-Call Pattern Detection

pattern detectors

ConnectIQ doesn't just analyze individual calls — it detects patterns across your entire call volume. Repeated complaints about a product, emerging billing issues, or training gaps that affect multiple agents are surfaced before they escalate.

Cost Quantification

cost per issue

Every detected issue is tagged with an estimated cost impact. When ConnectIQ flags that 12% of calls about "returns" result in repeat contacts, it also tells you that's costing $47K/month — making prioritization decisions instant.

AI Root Cause Analysis

root cause

When a pattern is detected, ConnectIQ doesn't just show you the data — it generates a root cause analysis explaining why it's happening, which agents and teams are affected, and what specific actions to take.

TQI

Translation Quality Index (TQI)

per-call score

Every call receives a quality score computed from latency, confidence, language detection accuracy, and translation fidelity — measured via async back-translation scoring. Track quality trends per language pair, per provider, hourly, and daily.

Pipeline Health Monitoring

health checks

Real-time health metrics for every translation provider, aggregated every 5 minutes. When a provider degrades, ConnectIQ detects it automatically and alerts your team — before customers notice.

66+

Cross-Lingual Pattern Detection

languages correlated

Correlates complaint patterns across languages — a Spanish complaint call and a Mandarin complaint call about the same product issue are linked together. Something monolingual QA teams simply cannot do.

⚡

Automated Action Triggers

auto-actions

Detected patterns automatically trigger configurable actions: webhook notifications, email alerts, Amazon Connect tasks, or SQS messages — turning intelligence into immediate operational response.

Why ConnectIQ Is Different

Not Just Analytics — Intelligence

Traditional analytics dashboards show you charts. ConnectIQ generates actionable intelligence: root cause explanations, cost impact estimates, and specific recommendations for each detected issue.

Cross-Call, Not Per-Call

Individual call analysis misses systemic problems. ConnectIQ correlates patterns across thousands of calls to surface issues that no single call review would catch — like a product defect causing a 40% spike in returns-related calls across three regions.

Translation Quality + Cross-Lingual Intelligence

ConnectIQ v2 monitors translation quality per call via TQI scoring and correlates patterns across all 66 languages — a Spanish complaint and a Mandarin complaint about the same issue get linked together, with per-provider health tracking ensuring consistent quality.

Bring Your Own Infrastructure

BYO Amazon Connect & SAML Single Sign-On

Connect your existing Amazon Connect instance to SpeechTranslate using a guided wizard with three deployment modes. Optionally federate with Okta or Microsoft Entra ID (Azure AD) via SAML 2.0 so your agents sign in with their existing corporate credentials.

Same Account

Simplest setup

Your Connect instance is in the same AWS account as SpeechTranslate. Enter your instance alias and you're done — no cross-account roles required.

Same Account + Contact Lens

With analytics

Same-account deployment with Amazon Connect Contact Lens enabled for call analytics, sentiment analysis, and real-time transcription from the Contact Lens pipeline.

Cross-Account

Enterprise isolation

Your Connect instance lives in a separate AWS account. A one-click CloudFormation template creates the cross-account IAM role — the wizard auto-detects when it's ready.

SAML 2.0 Identity Federation

Okta Integration

Map Okta groups to SpeechTranslate roles (admin, supervisor, agent) from the SSO admin page. Agents sign in through your Okta tenant — no separate password to manage. Group mappings are evaluated on every login and can be previewed before saving.

Microsoft Entra ID (Azure AD)

Configure SAML federation with Microsoft Entra ID for enterprise environments. The onboarding wizard generates the metadata URL, ACS endpoint, and setup instructions that your IT admin needs to configure on the Entra side.

Your Data, Your Account

BYO AWS QuickSight — Embedded Analytics & Generative Q&A

Connect your own AWS QuickSight account to embed interactive dashboards and use Generative Q&A to ask natural-language questions about your call data. Everything runs in your AWS account — datasets, SPICE storage, and dashboards never leave your environment.

Embedded Dashboards

View your QuickSight dashboards directly inside the SpeechTranslate admin console. Configure which QuickSight account and region to use, then browse and embed any dashboard from your account. Cross-account access is handled via a CloudFormation-deployed IAM role.

Generative Q&A

Ask questions about your data in plain English — "What was our average handle time for Spanish calls last week?" — and get instant visual answers powered by QuickSight Q. No SQL, no dashboard building, no data exports.

Zero Data Movement

Unlike traditional BI integrations that export data to third-party analytics platforms, BYO QuickSight keeps everything inside your AWS account. SpeechTranslate provisions datasets and dashboards in your QuickSight namespace — call history, translation metrics, agent performance, and quality scores are queried in place. Nothing is copied, exported, or stored outside your environment.

Under the Hood

Built on Enterprise-Grade AWS Infrastructure

SpeechTranslate combines multiple speech and translation providers with a custom real-time audio pipeline to deliver the fastest, most reliable speech-to-speech translation available.

Dual-Mode: Speech-to-Speech + Pipeline

Choose Speech-to-Speech via Nova Sonic V2 or Gemini Live for direct audio translation, or the multi-provider STT+MT+TTS pipeline with automatic failover across AWS, Azure, and Google.

Sub-300ms Latency

AudioWorklet-based gapless playback pipeline with WebRTC transport delivers near-instant translation with no perceptible delay.

Custom Glossary Support

Define domain-specific terminology for healthcare, legal, financial, or any industry to ensure critical terms are translated correctly.

Bidirectional Translation

Both sides of the conversation are translated simultaneously. The caller and agent each hear the other in their own language.

Amazon Connect Integration

Native PSTN integration via Amazon Connect — works with real phone calls, not just browser-to-browser. Agents use the familiar CCP softphone.

Automatic Language Detection

Multi-provider voting system across AWS Transcribe, Google Cloud, and Azure Speech identifies the caller's language from the first seconds of speech — with configurable confidence thresholds and mid-call language switching when the caller changes languages.

Mid-Call Language Switching

Callers aren't limited to one language per call. SpeechTranslate detects language changes in real time and switches translation seamlessly — no agent action required.

Ready to Translate Your First Call?

See how SpeechTranslate, Agent Assist, and ConnectIQ work together for your specific use case.