Voice AIReal-time Audio Processing3D Avatar AnimationBrowser ExtensionAccessibility Tech

SignalCast — Real-Time Voice AI & 3D Avatar Translation Engine

A real-time voice AI system that extracts, transcribes, and translates spoken audio from video content — then drives a 3D animated avatar to perform the translation live, in sync with the video.

7

Core modules built

Real-time

Audio-to-avatar sync

Multi-lang

Audio transcription support

3D

Live avatar rendering

The Problem

1.5 billion people can't access online video

Over 1.5 billion people worldwide have hearing loss — yet the majority of online video content remains completely inaccessible to them. Existing solutions like closed captions are static, imprecise, and fail to convey the nuance of spoken language. There was no real-time, context-aware, multilingual system that could translate video audio into dynamic sign language — especially one that could handle multiple languages and operate directly inside a browser.

“There is no complete, globally applicable solution that provides real-time, context-aware sign language translation for video content across multiple languages.”

The Voice AI Pipeline

Video URL → live 3D sign language

🎬 Video URL🔊 Audio Extract📝 Speech-to-Text🧠 NLP Context🤟 Sign Mapping🧍 3D Avatar📺 Live Overlay
1

User pastes a video URLThe browser extension validates the link, detects video format and metadata, and initiates the processing pipeline.

2

Audio is extracted and preprocessedBackground noise is filtered, audio quality optimised using OpenCV pipelines to maximise transcription accuracy.

3

Speech-to-text via Groq Cloud APIMultilingual audio transcribed with high accuracy in real time; NLTK cleans and normalises the text output.

4

Context-aware sign language mappingThe system analyses sentence context using NLP — not just word-by-word mapping — producing natural and semantically accurate sign sequences.

5

3D avatar rendered via Three.js + UnityA real-time animated character performs the sign language gestures, rigged and synced frame-by-frame with the video timeline.

6

Live overlay inside the video frameThe avatar is rendered directly within the video player, not as a separate window — creating a seamless, immersive viewing experience.

Key Features

What's inside

Real-time audio extraction from video URLs
Multilingual speech-to-text transcription
Context-aware NLP sign language mapping
Live 3D avatar animation (Unity + Three.js)
In-video overlay rendering (browser-native)
Real-time video-avatar sync engine
User dashboard with history & controls
Social login + account management

Tech Stack

Built with

FrontendReact JS 18.2, HTML5, CSS3

3D RenderingThree.js 0.159, Unity 2022.3

Voice AIGroq Cloud API 4.2 (speech-to-text)

NLPNLTK 3.8 (text processing & context)

Vision / MLOpenCV 4.8 (audio optimisation)

BackendPython 3.13

DesignFigma, PyCharm

Engineering Challenge

Why this was technically hard

Most voice AI projects stop at transcription. SignalCast goes 3 layers deeper — and each layer introduces significant engineering complexity.

The hardest part wasn't speech-to-text — it was synchronising a live 3D avatar with real-time buffered audio inside a browser extension, while maintaining context-aware (not word-for-word) sign language accuracy across multiple languages simultaneously.

Solving the real-time buffering challenge — where sign language gestures must stay in sync with the video even as transcription lag occurs — required building a custom frame-sync engine between the Python backend and the Three.js/Unity 3D renderer.

The Outcome

A genuinely novel voice AI system

SignalCast delivered a working real-time voice AI system that converts any video's spoken audio into live 3D sign language — rendered directly inside the browser, in sync with the video. A genuinely novel approach to voice AI that goes far beyond transcription into real-time motion generation and avatar animation.

This project showcases our capability to build end-to-end voice AI pipelines — from audio extraction and multilingual speech processing to NLP context analysis and real-time 3D rendering. The same pipeline architecture applies directly to voice agents, AI avatars, real-time translation tools, and multimedia automation systems.

Ready to build?

Want something like this?

Tell us about your project. We'll come back with a custom scope and proposal — no pressure.

Book a Free Discovery Call