SignalCast — Real-Time Voice AI & 3D Avatar Translation Engine
A real-time voice AI system that extracts, transcribes, and translates spoken audio from video content — then drives a 3D animated avatar to perform the translation live, in sync with the video.
7
Core modules built
Real-time
Audio-to-avatar sync
Multi-lang
Audio transcription support
3D
Live avatar rendering
The Problem
1.5 billion people can't access online video
Over 1.5 billion people worldwide have hearing loss — yet the majority of online video content remains completely inaccessible to them. Existing solutions like closed captions are static, imprecise, and fail to convey the nuance of spoken language. There was no real-time, context-aware, multilingual system that could translate video audio into dynamic sign language — especially one that could handle multiple languages and operate directly inside a browser.
The Voice AI Pipeline
Video URL → live 3D sign language
User pastes a video URL — The browser extension validates the link, detects video format and metadata, and initiates the processing pipeline.
Audio is extracted and preprocessed — Background noise is filtered, audio quality optimised using OpenCV pipelines to maximise transcription accuracy.
Speech-to-text via Groq Cloud API — Multilingual audio transcribed with high accuracy in real time; NLTK cleans and normalises the text output.
Context-aware sign language mapping — The system analyses sentence context using NLP — not just word-by-word mapping — producing natural and semantically accurate sign sequences.
3D avatar rendered via Three.js + Unity — A real-time animated character performs the sign language gestures, rigged and synced frame-by-frame with the video timeline.
Live overlay inside the video frame — The avatar is rendered directly within the video player, not as a separate window — creating a seamless, immersive viewing experience.
Key Features
What's inside
Tech Stack
Built with
Frontend — React JS 18.2, HTML5, CSS3
3D Rendering — Three.js 0.159, Unity 2022.3
Voice AI — Groq Cloud API 4.2 (speech-to-text)
NLP — NLTK 3.8 (text processing & context)
Vision / ML — OpenCV 4.8 (audio optimisation)
Backend — Python 3.13
Design — Figma, PyCharm
Engineering Challenge
Why this was technically hard
Most voice AI projects stop at transcription. SignalCast goes 3 layers deeper — and each layer introduces significant engineering complexity.
Solving the real-time buffering challenge — where sign language gestures must stay in sync with the video even as transcription lag occurs — required building a custom frame-sync engine between the Python backend and the Three.js/Unity 3D renderer.
The Outcome
A genuinely novel voice AI system
SignalCast delivered a working real-time voice AI system that converts any video's spoken audio into live 3D sign language — rendered directly inside the browser, in sync with the video. A genuinely novel approach to voice AI that goes far beyond transcription into real-time motion generation and avatar animation.
Ready to build?
Want something like this?
Tell us about your project. We'll come back with a custom scope and proposal — no pressure.
Book a Free Discovery Call