Post by Karya

16,153 followers

Just a few months ago, Bhili, a tribal language spoken by millions across western and central India, had little presence in India’s AI ecosystem. Today, Bhili linguistic datasets created with the Bhil community of Nandurbar are being made openly available on AIKosh, unlocking a vital language resource for researchers, developers, startups, and public institutions building inclusive AI systems. These datasets capture authentic local speech, knowledge, and cultural context. This is what digital public infrastructure for AI looks like in practice. By open-sourcing high-quality language datasets as digital public goods, we enable innovation far beyond a single use case, powering new AI applications, advancing language technologies, and making public services more accessible in languages that have historically remained underrepresented in the digital world We are grateful to the District Administration of Nandurbar (Collector Office Nandurbar),  Government of Maharashtra (GoM), and BHASHINI - (Digital India BHASHINI Division) for championing this vision of linguistic inclusion. We also thank AI4Bharat, and EkStep Foundation for their support in enabling Bhili language access on MahaVISTAAR, bringing digital agricultural services closer to millions of farmers in Maharashtra. From community-created language assets to open digital public goods, this journey demonstrates what is possible when governments, technology partners, and local communities come together to ensure that no language is left behind in the age of AI.

Post content