Delhi's municipal and state government databases are carrying a quietly growing problem: duplicate images embedded across digital record systems, from voter ID files managed by the Delhi State Election Commission to property documentation held by the Municipal Corporation of Delhi. The issue has reached a point where administrators can no longer defer the structural decisions about how — and whether — to clean house.
The timing matters because Delhi is mid-way through several large digitisation drives. The Delhi Metro Rail Corporation is digitising station asset records for the Phase 4 corridor, which stretches from Janakpuri West to RK Ashram Marg. The Delhi Development Authority has been uploading land-use maps and heritage structure photographs for Old Delhi wards covering areas around Chandni Chowk and Ballimaran. Both efforts are generating fresh image data at volume, and if duplicate files are not systematically flagged before the new batches merge with legacy archives, the problem doubles in scope.
Why Duplicates Accumulate — and What They Cost
The mechanics are straightforward. Government departments have historically received image submissions through multiple channels — scanned paper forms, mobile uploads, and contractor-supplied batches — with no single deduplication checkpoint. A photograph of a Yamuna floodplain encroachment, for instance, might enter the Delhi Jal Board's system through a field officer's phone, again through a contractor's weekly data dump, and a third time when a supervisor re-submits for approval. Each copy is stored, indexed, and backed up separately.
Storage costs alone are not trivial. Government cloud contracts in India typically price archival storage at roughly ₹2 to ₹4 per gigabyte per month, depending on the tier and vendor, according to publicly available cloud service schedules. A single department running tens of thousands of unresolved duplicates can accumulate unnecessary expenditure running into lakhs of rupees annually, money that could otherwise fund the maintenance of systems like the Integrated Command and Control Centre on Kashmere Gate, which handles real-time city monitoring data.
Beyond cost, duplicates create search and retrieval errors. When a Delhi Pollution Control Committee officer queries the image archive for a specific site on the Anand Vihar industrial belt — one of the capital's most-monitored air quality zones — duplicate records can return misleading results, suggesting a site was inspected twice when it was visited once, or vice versa.
The Decisions Ahead
Three choices now sit in front of the relevant agencies, and each carries a different risk profile.
The first is a full retrospective audit — scanning every existing image archive with automated hash-matching tools that identify identical or near-identical files. The Delhi e-Governance Society, which sits under the IT Department, has the technical infrastructure to lead this, but a city-wide audit would likely require a dedicated six-to-nine month window and a budget allocation that has not yet appeared in any publicly circulated plan for the 2026-27 financial year.
The second option is a forward-only fix: accept the legacy mess, draw a line at a specific date, and enforce deduplication strictly on all new uploads from that point. This is faster and cheaper in the short term, but it leaves the contaminated historical archive in place, meaning that any project requiring retrospective analysis — Yamuna cleanup litigation evidence, Old Delhi heritage surveys, pollution trend mapping — continues to pull from dirty data.
The third path is phased remediation by department priority. Under this model, high-stakes archives — voter rolls managed by the CEO Delhi office, DMRC Phase 4 asset files, and Yamuna authority records — get cleaned first, with lower-priority systems addressed over a rolling two-year cycle.
Officials at the Delhi Secretariat on IP Estate Road have not publicly confirmed which approach will be adopted. The Delhi Legislative Assembly's Public Accounts Committee is scheduled to review IT infrastructure spending in the coming session, which gives legislators an opportunity to press for a formal policy commitment. Without one, individual departments will continue making ad hoc decisions, and the window to fix the problem cleanly — before Phase 4 data migration completes — is narrowing fast.