Why Indic ASR Quantized?

High-quality ASR models for Indic languages are typically large and computationally expensive, requiring powerful GPUs for real-time inference. This makes them inaccessible for many users and applications.

The problem: Existing Indic ASR models are too heavy for CPU-only devices, limiting their practical use in real-world applications.

The solution: I quantized the AI4Bharat IndicConformer model to INT8, achieving a 4x reduction in model size and significant speedup in inference time, though at the cost of accuracy.

What's Unique About This?

I performed the quantization myself. This isn't just a wrapper around an existing quantized model, I actually performed the INT8 post-training quantization of the AI4Bharat IndicConformer model. This required careful calibration, evaluation, and validation.

Architecture

Quantization Process

Model Selection: AI4Bharat IndicConformer-600M multilingual model
Calibration: Generated calibration dataset for static post-training quantization
Quantization: Applied INT8 quantization using ONNX Runtime tools
Evaluation: Benchmarked WER (Word Error Rate) and CER (Character Error Rate)
Packaging: Created Python package for easy distribution and use

Supported Languages

Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Manipuri, Marathi, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu

Problems Faced

Quantization accuracy loss: Balancing model size reduction with accuracy preservation required careful calibration and iterative testing.
Framework compatibility: Ensuring ONNX Runtime compatibility across different hardware configurations (CPU vs GPU).
Evaluation complexity: Benchmarking ASR models across 22 languages with varying phonetic characteristics and data availability.

Learnings

Since building Indic ASR Quantized, I learned:

How to perform post-training quantization on large speech models
Using ONNX Runtime tools for model optimization and inference
Benchmarking ASR models with WER and CER metrics
Creating reproducible experiments with Jupyter notebooks
Building Python packages for distributing optimized models

Future Vision

This project is archived and I don't plan on working on it.

Indic ASR Quantized

Year

Status

Tech Stack

Links

Contents