Mozilla reposted this
Kerala has a 96% literacy rate — one of the highest in Asia. It also has one of the world's most active Gulf diaspora communities, with millions of Malayalam speakers working abroad and maintaining deep ties home through voice calls, voice notes, and increasingly, voice interfaces. And yet time-aligned Malayalam speech data — the kind needed to build accurate subtitling, transcription, and accessibility tools — has been almost entirely absent from the open ecosystem. Until now. A community-built Malayalam Time-Aligned Speech Corpus is now openly available. Time-alignment isn't just a technical nicety: it's what makes a corpus useful for real-world applications like accessibility tech, media transcription, and language learning tools. Built by community. Open for everyone. That's the model. #Malayalam #OpenData #SpeechCorpus #Accessibility #NLP https://lnkd.in/e2xtyrqg