Abstract
In this work, we build on our recent work toward developing a neural speech recognition (NSR) for Quranic recitations that is accessible to people of any age, gender, or expertise level. The Quran recitations by females and males (QRFAM) dataset, a sizable benchmark dataset of audio recordings made by male and female reciters from various age groups and competence levels, was previously reported in our prior works. In addition to this dataset, we used various subsets of the QRFAM dataset for training, validation, and testing to build several basic NSR systems based on Mozilla’s DeepSpeech model. Our current efforts to optimize and enhance these baseline models have also been presented. In this study, we expand our efforts by utilizing one of the well-known speech recognition models, Whisper, and we describe the effect of this choice on the model’s accuracy, expressed as the word error rate (WER), in comparison to that of DeepSpeech.
| Original language | English |
|---|---|
| Article number | 9521 |
| Journal | Applied Sciences (Switzerland) |
| Volume | 15 |
| Issue number | 17 |
| DOIs | |
| State | Published - Sep 2025 |
Keywords
- DeepSpeech
- QRFAM dataset
- automatic speech recognition
- large audio models
Fingerprint
Dive into the research topics of 'Enhanced Neural Speech Recognition of Quranic Recitations via a Large Audio Model †'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver