Delhi's ambitious push to digitise civic records hit a visible snag this week after archivists working under the Delhi Urban Shelter Improvement Board flagged that tens of thousands of scanned property and heritage photographs stored on government servers contain near-identical duplicates, inflating storage costs and slowing retrieval systems critical to planning approvals across the city.
The problem matters now because three of Delhi's largest ongoing digitisation pushes — the Delhi Metro Rail Corporation's Phase 4 corridor documentation project, the Municipal Corporation of Delhi's property tax record digitisation, and the Archaeological Survey of India's Old Delhi heritage asset inventory centred around Chandni Chowk and Shahjahanabad — are all feeding into overlapping cloud storage pools. Duplicate images, some archivists say, can account for a disproportionate share of stored data when multiple departments scan the same physical document without cross-referencing a central repository.
The National Informatics Centre, which manages backend infrastructure for many Delhi government digital projects, confirmed this week that a deduplication audit was underway, though it provided no timeline for completion. The issue first surfaced publicly when the MCD's Zonal Office in Civil Lines reported processing delays on heritage building clearance files, some of which had been scanned and uploaded as many as four separate times by different departments over the past 18 months.
Where the Problem Is Being Felt
Field teams at the Delhi State Archives on Rajpur Road in Civil Lines have been manually flagging duplicate image sets since late June — painstaking work that pulls staff away from active cataloguing. The archives hold records dating back to the Mughal period, and the current digitisation phase, which began in January 2025, was meant to make approximately 1.2 million documents publicly searchable by the end of this financial year. That target is now under strain.
At the DMRC's documentation unit in Shastri Park, engineers working on Phase 4's Janakpuri West to Krishna Park Extension corridor have encountered a related bottleneck. Geo-tagged site photographs submitted by contractors for progress verification were found to be duplicated across at least three internal servers, requiring an unplanned reconciliation exercise that project managers say added roughly two weeks to the verification cycle for the stretch between Mayapuri and Punjabi Bagh West.
The MCD launched its citywide property record digitisation scheme in April 2024 with a stated budget of Rs 47 crore. Storage overruns caused in part by unmanaged duplicates have prompted internal reviews, according to documents reviewed by this correspondent, though the MCD has not issued a public statement detailing the financial impact.
Why Deduplication Is Harder Than It Sounds
Replacing or removing duplicate images from government archives is not straightforward. Unlike consumer photo libraries, civic document images are often legally referenced by their original file name and upload timestamp. Deleting a file — even one that is byte-for-byte identical to another — can invalidate a legal citation or audit trail. The National Informatics Centre's standard protocol requires human sign-off before any deduplication tool removes a file, which means automated solutions are only partially useful.
The Delhi government has been in discussions with IIT Delhi's Bharti School of Telecommunication to explore AI-assisted deduplication tools that flag rather than delete, preserving legal integrity while eliminating storage waste. No formal contract has been announced.
For residents and small businesses waiting on digitised property records — particularly in densely documented areas like Karol Bagh and Lajpat Nagar where property disputes are common — the practical effect is slower turnaround at MCD service counters. The MCD's online property record portal, accessible at mcdonline.nic.in, has shown intermittent slowdowns this week that the corporation attributed to backend maintenance.
The NIC audit is expected to produce an interim report by the end of July. Until deduplication protocols are standardised across Delhi's civic departments, archivists say the problem will keep compounding — every new scanning drive adds fresh layers to an already cluttered digital heap.