The Daily Delhi

Delhi news, every day

News

Delhi battles millions of duplicate digital records plaguing city databases

As civic databases and government portals buckle under millions of redundant digital records, Delhi's archivists and urban planners are scrambling to clean up a mess that other megacities solved years ago.

By Delhi News Desk · Published 5 July 2026, 12:09 am

3 min read

Delhi battles millions of duplicate digital records plaguing city databases
Photo: Photo by Ranjeet Chauhan / Pexels

Delhi's municipal and heritage databases contain an estimated tens of thousands of duplicate photographic records — images catalogued twice, sometimes three times, under different file names — clogging the digital infrastructure that city planners rely on for everything from Yamuna River cleanup assessments to Old Delhi redevelopment approvals. The problem has quietly frustrated archivists at the Delhi Urban Art Commission and data managers at the Municipal Corporation of Delhi for at least three years, according to publicly available audit summaries released in 2025.

The timing matters. Delhi Metro Rail Corporation is mid-way through Phase 4 expansion, a project that relies on geo-tagged photographic surveys of affected corridors including Janakpuri West to Krishna Park Extension. When duplicate images sit in the same survey database, they inflate cost estimates, distort before-and-after comparisons, and slow down the environmental clearance process. The Air Quality Early Warning System, maintained jointly by the India Meteorological Department and the System of Air Quality and Weather Forecasting And Research, faces a parallel issue: redundant satellite image files submitted by multiple monitoring stations near Anand Vihar and Punjabi Bagh push storage costs higher and make algorithmic trend analysis less reliable.

What Other Cities Have Done

Mumbai moved first among Indian metros. The Brihanmumbai Municipal Corporation began a structured de-duplication drive in 2023, deploying perceptual hashing software — a technique that compares images by visual fingerprint rather than file name — across its heritage documentation archive. The BMC reported cutting redundant files by roughly 34 percent within 18 months, freeing significant server capacity in its Mantralaya-adjacent data centre. Kolkata's Kolkata Municipal Corporation has a smaller digitised archive and a less acute version of the same problem, having migrated its property survey photographs to a centralised cloud server in 2024.

Globally, the gap between Delhi and cities that have tackled this is striking. Seoul Metropolitan Government completed a full photographic de-duplication of its urban planning records in 2022, using open-source tools built on the ImageHash Python library. London's Ordnance Survey, working alongside the Greater London Authority, mandated hash-based duplicate checks as a condition of any new geospatial data contract from January 2024. Both cities cite lower storage bills, faster Freedom of Information request processing, and cleaner inputs for AI-assisted urban modelling as the tangible results.

Delhi has no equivalent mandate yet. The Delhi government's Information Technology Department has circulated internal guidance recommending de-duplication best practices, but no formal policy binding city agencies to a timeline has been made public. The result is a patchwork: the Delhi Development Authority has taken informal steps to audit its land-use photography archive in Dwarka and Rohini, while the Archaeological Survey of India's New Delhi Circle — which manages digitised records for protected monuments including Humayun's Tomb and Safdarjung's Tomb — operates its own separate system with no cross-agency co-ordination.

What Needs to Happen Next

Technical experts familiar with civic data systems point to three practical steps that have worked elsewhere. First, adopt a hash-based duplicate detection standard across all agencies that submit photographic records to shared databases — a step that costs relatively little in compute terms and can be implemented incrementally. Second, assign a single nodal officer within the Delhi government's IT Department to co-ordinate de-duplication timelines across the MCD, DDA, DMRC, and pollution monitoring bodies. Third, tie any new vendor contract for digital archiving to a clause requiring de-duplication compliance before data handover, mirroring the London model.

The Phase 4 Metro expansion offers a practical forcing moment. DMRC's photographic survey contracts for the Lajpat Nagar to Saket G-Block corridor and the Inderlok to Indraprastha line are expected to generate hundreds of thousands of new images through 2027. Building clean data habits into those contracts now, before the archive grows larger, is cheaper than cleaning up after the fact — a lesson Mumbai learned the hard way between 2021 and 2023. Delhi has the advantage of learning from what others got wrong. Whether its agencies co-ordinate fast enough to use that advantage is a different question entirely.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Delhi

This article was produced by the The Daily Delhi editorial desk and covers news in Delhi. See our editorial standards for how we use AI.

The Daily Delhi brief

The day's Delhi news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Delhi news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Delhi

More in News

Enjoyed this story? Get tomorrow's briefing free.