At least 34 percent of image files held across Delhi government digital repositories are estimated to be duplicates — identical or near-identical copies stored multiple times, often across departments that have never coordinated a shared filing standard. That figure, drawn from an audit methodology applied by the National Informatics Centre's Delhi State Unit as part of its ongoing Digital India compliance review cycle, represents hundreds of terabytes of redundant data spread across servers in Indraprastha Estate, the Delhi Secretariat in ITO, and off-site backup nodes in Dwarka Sector 10.
The issue matters right now because money is tight and scrutiny is high. The Delhi government's IT budget for the 2025-26 financial year allocated roughly ₹420 crore to digital infrastructure maintenance and upgrades — a figure that departmental procurement officers say is already stretched by the parallel demands of the Phase 4 Delhi Metro documentation drive, the Yamuna Action Plan's GIS mapping programme, and the rollout of the Unified Traffic Management System on the Outer Ring Road corridor. Every gigabyte of redundant image data burned on duplicate storage is a gigabyte that costs money to maintain, back up, and secure.
Where the Duplication Piles Up
The problem concentrates most visibly in two specific operations. First, the Delhi Metro Rail Corporation's civil documentation wing, which photographs construction progress at each of the 65 stations under the Phase 4 expansion from Janakpuri West to RK Ashram Marg, generates daily image batches that field engineers upload without a central deduplication protocol. Multiple teams working on the same station — say, the Sarojini Nagar underground section or the elevated stretch near Tughlaqabad — independently upload overlapping shots. DMRC's own technology division acknowledged the absence of a mandatory deduplication step in its tender documents for the Phase 4 data management contract issued in March 2025.
Second, the MCD Heritage Conservation Cell, which has been digitising records of 1,200-plus listed structures across Old Delhi — from the lanes off Chandni Chowk to the havelis clustered near Ballimaran — has been working with images supplied by multiple consultants hired at different points since 2019. Those consultants used different camera specifications and naming conventions, and the resulting archive contains thousands of duplicated images of the same facades, doorways, and inscriptions, identified as distinct files because their metadata differs even when the visual content is the same. The Cell operates out of the Civic Centre on Minto Road and has flagged the problem in successive internal progress reports, though a remediation timeline has not been made public.
What Deduplication Actually Costs — and Saves
Deduplication is not a novel technical challenge. Standard software tools — including open-source options such as dupeGuru and commercial enterprise platforms used by state governments in Tamil Nadu and Gujarat — can identify redundant image files based on perceptual hashing algorithms rather than simple filename matching, which is why metadata differences alone do not defeat them. For a mid-sized government repository holding around 50 terabytes of image data, a full deduplication pass typically takes 72 to 96 hours on commodity server hardware and can recover between 20 and 40 percent of consumed storage, according to benchmarks published by the Centre for Development of Advanced Computing in Pune in a January 2026 technical note on government cloud optimisation.
At current AWS Mumbai region storage pricing — which several Delhi government departments use through the MeitY empanelment framework — 50 terabytes runs to approximately ₹1.1 lakh per month. A 30 percent reduction means savings of around ₹33,000 monthly per repository, compounding across dozens of departmental stores.
The National Informatics Centre has recommended that all state departments adopt a deduplication checkpoint before any image file is committed to long-term storage — a policy shift that would require an amendment to the Delhi government's existing IT data management circular, last revised in August 2022. The Delhi Secretariat's IT department has not confirmed whether that amendment is in active preparation. Departments with live projects — particularly DMRC and the Heritage Conservation Cell — would be the most immediate candidates for a pilot. Until a formal policy lands, the duplicate count keeps climbing, one construction photograph at a time.