At least 40 percent of image files stored across Delhi's major civic digital repositories are exact or near-exact duplicates, according to an internal audit framework applied by the National Informatics Centre's Delhi regional office during a review of government data infrastructure completed in the first quarter of 2026. The finding has forced a reckoning inside several departments that digitised records rapidly during the post-pandemic push but never built deduplication protocols into their workflows.
Why does this matter right now? The Delhi Metro Rail Corporation is midway through its Phase 4 expansion, which runs 65.1 kilometres across corridors including the Janakpuri West–R.K. Ashram Marg stretch and the Aerocity–Tughlakabad line. Every site inspection, every structural photograph, every progress report uploaded to the DMRC's project management portal generates image sets. Without automated duplicate-image replacement procedures, the same photograph — shot from marginally different angles or compressed at different quality settings — gets catalogued as a distinct file. Storage costs compound monthly.
The Scale of the Problem Across Delhi's Data Systems
The Municipal Corporation of Delhi's heritage documentation programme, which has been digitising buildings in Shahjahanabad — the walled city area around Chandni Chowk and Lal Qila — since 2021, is among the worst-affected. Field teams using smartphones and DSLRs uploaded images to a centralised server managed through the Delhi Urban Art Commission. Preliminary figures from the NIC review suggest that the Shahjahanabad archive alone holds somewhere between 2.2 lakh and 2.6 lakh image files, with duplicate or near-duplicate pairs accounting for a conservatively estimated 90,000 of those entries. Each high-resolution heritage photograph averages roughly 8 megabytes. Do the arithmetic: that is close to 720 gigabytes of redundant data sitting on servers that cost taxpayer money to run and back up.
The Delhi government's Rozgar Bazaar portal — the employment exchange platform launched under the AAP administration to connect job-seekers with employers — faces a different but related version of the same problem. Employers and applicants both upload profile and credential photographs. Without hash-based duplicate detection at the point of upload, the same scanned document frequently appears multiple times under different user sessions. The portal's backend team, operating out of offices near ITO on Vikas Marg, has been working with a vendor to implement perceptual hashing since March 2026, a process that compares image fingerprints rather than pixel-by-pixel data.
What Deduplication Actually Costs — and What It Saves
Duplicate-image replacement is not a glamorous fix, but the economics are hard to argue with. Cloud storage for government workloads in India typically runs between ₹3 and ₹6 per gigabyte per month depending on the tier and vendor contract. A department holding 10 terabytes of image data that is 40 percent redundant is paying for roughly 4 terabytes it does not need — a recurring waste of between ₹12,000 and ₹24,000 every single month, scaling sharply as archives grow. Across a dozen major Delhi civic departments, aggregate redundant storage could comfortably exceed 50 terabytes, pushing the monthly dead-weight cost past ₹1.5 lakh by conservative modelling.
The NIC review reportedly recommends a three-stage intervention: first, a one-time deduplication pass using open-source tools to clear historical redundancy; second, ingest-level duplicate detection on all new uploads; third, a metadata standardisation protocol so that images are tagged with GPS coordinates, date stamps, and project codes at the moment of capture. That third step is the hardest, because it requires behaviour change from field staff — the engineers photographing a flyover pillar in Dwarka or the heritage surveyors working a haveli lane off Ballimaran.
Departments that have already moved are finding tangible results. The process is iterative and unglamorous, but the storage savings materialise within the first billing cycle after a deduplication run. For agencies managing Delhi's rapidly growing infrastructure and heritage documentation simultaneously, that is not a minor footnote — it is a line item that can be redirected toward actual digitisation work rather than paying to store the same photograph twice.