Defence & AI2024· Classified Government Client — India

Project SCRIBE

Multilingual AI Transcription & Translation

On-premise transcription and translation across 12 Indian languages for classified briefings — fully air-gapped, zero external API calls.

At a glance

91.4%

WER improvement

vs base Whisper

Indian languages

including dialects

8 min

2-hour briefing

transcription time

External API calls

fully air-gapped

Built with

Whisper ASRMultilingual NLPPyTorchFastAPI12 LanguagesAir-gapped

Overview

A government intelligence unit needed to transcribe and translate audio across 12 Indian languages — including dialects — without any data leaving their secure facility.

The Challenge

Commercial ASR was prohibited. Open-source models handled Indian dialects poorly, especially regional Hindi, Tamil, and Telugu — and the system had to cope with both clear audio and low-quality field recordings.

How it fits together

Architecture

Audio Ingest

field + clear

Preprocess

+18% low-SNR

Fine-tuned ASR

12 languages

Translation

local LLM

The Solution

We fine-tuned Whisper on 200 hours per language with a focus on regional dialects. A custom preprocessing pipeline improved low-SNR accuracy by 18%, and a locally-deployed multilingual LLM handled translation — all on air-gapped servers.

Results

Word-error-rate improvement+91.4%

Low-SNR accuracy gain+18%

Turnaround vs manual2 hrs → 8 min

The Outcome

91.4% average word-error-rate improvement over base Whisper across 12 languages, a 2-hour briefing transcribed in under 8 minutes, and zero external API calls.

91.4% WER improvement · 12 languages · 2-hour audio in 8 minutes

Highlights

Whisper fine-tuned on 200 hrs per language
91.4% WER improvement across 12 Indian languages
Fully air-gapped — no external API ever called

From brief to production

Delivery timeline

Months 1–2

Dataset & labelling

200 hrs / language

Months 3–4

Fine-tuning

Dialect-focused training

Month 5

Translation layer

Local multilingual LLM

Handover

Air-gapped deploy

Secure facility install

More Work

Project SENTINEL

Acoustic Intelligence & Threat Classification

Start a project All case studies