Hi guys…
I’m reaching out to the community for help. We’re working on a project that involves transcribing and analyzing call recordings, most of which are in a mix of Telugu and English. So far, we’ve tried Google Speech-to-Text and whisper-large-v3, but we’re seeing a Word Error Rate (WER) of over 40%, which is just too high for our needs.
We need an ASR system that can handle code-switching between Telugu and English efficiently. Accuracy is our top priority, as these transcriptions will be used for detailed analysis later on. Has anyone found a solution that works well in similar scenarios? Any suggestions or insights would be greatly appreciated.
1 Like
Check out the models from AI4Bharat. They offer pretrained conformers and wav2vec for Indic languages. You can easily load these models from Hugging Face and adjust them for your specific task.
I’ve been in a similar situation where accurate transcription was crucial for analyzing multilingual call recordings. I understand the challenge of dealing with code-switching between languages like Telugu and English. From my experience, while Google Speech-to-Text and Whisper are popular, they may struggle with high WER in such scenarios. I found that specialized ASR systems or custom-trained models on diverse multilingual datasets can significantly improve accuracy. Consider exploring tools like Microsoft Azure’s Speech Service, which offers better support for multilingual and code-switching scenarios, or even custom solutions from companies like iSpeech or Deepgram that can be tailored to your specific needs. Testing different options and possibly investing in custom model training might be necessary for achieving the precision required for your analysis.