Indic ASR Quantized

Why Indic ASR Quantized?
High-quality ASR models for Indic languages are typically large and computationally expensive, requiring powerful GPUs for real-time inference. This makes them inaccessible for many users and applications.
The problem: Existing Indic ASR models are too heavy for CPU-only devices, limiting their practical use in real-world applications.
The solution: I quantized the AI4Bharat IndicConformer model to INT8, achieving a 4x reduction in model size and significant speedup in inference time, though at the cost of accuracy.
What's Unique About This?
I performed the quantization myself. This isn't just a wrapper around an existing quantized model, I actually performed the INT8 post-training quantization of the AI4Bharat IndicConformer model. This required careful calibration, evaluation, and validation.
Architecture
Quantization Process
- Model Selection: AI4Bharat IndicConformer-600M multilingual model
- Calibration: Generated calibration dataset for static post-training quantization
- Quantization: Applied INT8 quantization using ONNX Runtime tools
- Evaluation: Benchmarked WER (Word Error Rate) and CER (Character Error Rate)
- Packaging: Created Python package for easy distribution and use
Supported Languages
Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Manipuri, Marathi, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu
Problems Faced
- Quantization accuracy loss: Balancing model size reduction with accuracy preservation required careful calibration and iterative testing.
- Framework compatibility: Ensuring ONNX Runtime compatibility across different hardware configurations (CPU vs GPU).
- Evaluation complexity: Benchmarking ASR models across 22 languages with varying phonetic characteristics and data availability.
Learnings
Since building Indic ASR Quantized, I learned:
- How to perform post-training quantization on large speech models
- Using ONNX Runtime tools for model optimization and inference
- Benchmarking ASR models with WER and CER metrics
- Creating reproducible experiments with Jupyter notebooks
- Building Python packages for distributing optimized models
Future Vision
This project is archived and I don't plan on working on it.