Synopsis
The final 20 AI curation units will be established over the next few months, officials told ET.Listen to this article in summarized format
The final 20 AI curation units will be established over the next few months, officials told ET.
“The government is focusing on diverse datasets including health statistics, agricultural surveys, geospatial maps, and other data in key areas such as logistics, demographics and environment,” an official at the electronics and information technology ministry (MeitY) said. “Finding and filtering these often fragmented data and making them available for AI modelling that can provide unique applications has been a key aim.”
Despite kicking off nearly two years back, the initiative to set up the curation units made slow progress due to opposition from certain ministries that have a strong existing system of data collection and management, officials said.
But the successful application of datasets uploaded to AIKosh has now convinced the ministries, they added.
“Identifying the silos where data resided in various ministries, and strengthening data management capacities also took time," the official quoted above said.
The Centre has mandated comprehensive AI readiness across ministries, with the push to improve dataset discoverability coming from the top echelons of the government.
The 30 curation units established so far have led to an uptick in the number of public sector datasets uploaded to AIKosh, which now hosts more than 9,500 datasets and 273 sectoral models.
AIKosh is a government-backed initiative under the IndiaAI Mission to provide Indian startups, researchers, and developers with seamless access to high-quality, anonymised datasets for AI model training and innovation.
Launched in March 2025, it centralises data from government agencies, academia, and private sources.
Till December 2025, AIKosh recorded over 38.5 million visits, 11,000 registered users and 26,000 downloads, officials said. They said it is the biggest contributor to the rise in indigenous AI tools and use cases.
The rising demand for larger datasets, which can satisfy strict parameters for being ethical and consent-based, has led to further focus on government datasets.
“There is also an accelerating trend of state government departments hosting their datasets on the platform, and further utilising the existing datasets for building AI applications,” the official said.
Government data is also better culturally representative, and helps to reduce biases and reliance on foreign or synthetic sources, he added.
( Originally published on Mar 23, 2026 )