On-premise transcription and translation across 12 Indian languages for classified briefings — fully air-gapped, zero external API calls.
At a glance
Built with
Overview
A government intelligence unit needed to transcribe and translate audio across 12 Indian languages — including dialects — without any data leaving their secure facility.
The Challenge
Commercial ASR was prohibited. Open-source models handled Indian dialects poorly, especially regional Hindi, Tamil, and Telugu — and the system had to cope with both clear audio and low-quality field recordings.
How it fits together
Architecture
The Solution
We fine-tuned Whisper on 200 hours per language with a focus on regional dialects. A custom preprocessing pipeline improved low-SNR accuracy by 18%, and a locally-deployed multilingual LLM handled translation — all on air-gapped servers.
Results
From brief to production
