All Projects
Featured

AI Chatbot with Voice & WhatsApp

An AWS Bedrock-powered AI chatbot integrated with the WhatsApp Business API. Supports both text and voice note interactions, with STT for incoming voice notes and TTS for outgoing voice replies.

AWS BedrockWhatsApp Business APIPythonSpeech-to-TextText-to-Speech

Overview

This system brings AI-powered conversation directly into WhatsApp — the messaging channel users already live in. Incoming text messages are processed by AWS Bedrock and replied to in natural language. Incoming voice notes are transcribed via a speech-to-text pipeline, fed into the same AI model, and the response is synthesized back into an audio file sent as a voice reply. The result is a fully conversational AI assistant accessible through a phone number.

Technology

AWS BedrockWhatsApp Business APIPythonSpeech-to-TextText-to-Speech

Key Features

  • Natural language conversation powered by AWS Bedrock foundation models
  • Incoming voice note transcription via speech-to-text pipeline
  • AI-generated responses synthesized to audio and sent as voice replies
  • Webhook-based WhatsApp Business API integration for real-time message handling
  • Conversation context management across multi-turn sessions
  • Graceful fallback to text reply when audio synthesis is unavailable

Technical Challenges

1

WhatsApp voice notes use Opus/OGG encoding, while most TTS services produce MP3 or WAV output. Bridging these formats required a server-side audio transcoding step to keep the round-trip within WhatsApp's delivery time expectations.

2

AWS Bedrock model inference latency needed to fit within the user-perceived response window for a messaging app, requiring careful prompt design and streaming response handling to avoid timeouts.

3

Maintaining coherent multi-turn conversation context across stateless webhook invocations required a session store keyed on the sender's phone number with appropriate expiry logic.

Outcome

Delivered a production chatbot accessible to any user with a WhatsApp account — no app install, no onboarding — supporting both typed and spoken interaction. The voice capability made the assistant significantly more accessible for users who prefer or need audio-first communication.

Have a similar project in mind?

Get In Touch