The Daily Delhi

Delhi news, every day

News

Delhi's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Staggering

From the Delhi Metro Rail Corporation's project documentation to the Municipal Corporation of Delhi's heritage records, redundant image files are costing storage budgets crores and slowing down public information systems.

By Delhi News Desk · Published 5 July 2026, 12:14 am

3 min read

Delhi's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Staggering
Photo: Photo by Shantum Singh on Pexels

At least 40 percent of image files stored across Delhi's major civic digital repositories are exact or near-exact duplicates, according to an internal audit framework applied by the National Informatics Centre's Delhi regional office during a review of government data infrastructure completed in the first quarter of 2026. The finding has forced a reckoning inside several departments that digitised records rapidly during the post-pandemic push but never built deduplication protocols into their workflows.

Why does this matter right now? The Delhi Metro Rail Corporation is midway through its Phase 4 expansion, which runs 65.1 kilometres across corridors including the Janakpuri West–R.K. Ashram Marg stretch and the Aerocity–Tughlakabad line. Every site inspection, every structural photograph, every progress report uploaded to the DMRC's project management portal generates image sets. Without automated duplicate-image replacement procedures, the same photograph — shot from marginally different angles or compressed at different quality settings — gets catalogued as a distinct file. Storage costs compound monthly.

The Scale of the Problem Across Delhi's Data Systems

The Municipal Corporation of Delhi's heritage documentation programme, which has been digitising buildings in Shahjahanabad — the walled city area around Chandni Chowk and Lal Qila — since 2021, is among the worst-affected. Field teams using smartphones and DSLRs uploaded images to a centralised server managed through the Delhi Urban Art Commission. Preliminary figures from the NIC review suggest that the Shahjahanabad archive alone holds somewhere between 2.2 lakh and 2.6 lakh image files, with duplicate or near-duplicate pairs accounting for a conservatively estimated 90,000 of those entries. Each high-resolution heritage photograph averages roughly 8 megabytes. Do the arithmetic: that is close to 720 gigabytes of redundant data sitting on servers that cost taxpayer money to run and back up.

The Delhi government's Rozgar Bazaar portal — the employment exchange platform launched under the AAP administration to connect job-seekers with employers — faces a different but related version of the same problem. Employers and applicants both upload profile and credential photographs. Without hash-based duplicate detection at the point of upload, the same scanned document frequently appears multiple times under different user sessions. The portal's backend team, operating out of offices near ITO on Vikas Marg, has been working with a vendor to implement perceptual hashing since March 2026, a process that compares image fingerprints rather than pixel-by-pixel data.

What Deduplication Actually Costs — and What It Saves

Duplicate-image replacement is not a glamorous fix, but the economics are hard to argue with. Cloud storage for government workloads in India typically runs between ₹3 and ₹6 per gigabyte per month depending on the tier and vendor contract. A department holding 10 terabytes of image data that is 40 percent redundant is paying for roughly 4 terabytes it does not need — a recurring waste of between ₹12,000 and ₹24,000 every single month, scaling sharply as archives grow. Across a dozen major Delhi civic departments, aggregate redundant storage could comfortably exceed 50 terabytes, pushing the monthly dead-weight cost past ₹1.5 lakh by conservative modelling.

The NIC review reportedly recommends a three-stage intervention: first, a one-time deduplication pass using open-source tools to clear historical redundancy; second, ingest-level duplicate detection on all new uploads; third, a metadata standardisation protocol so that images are tagged with GPS coordinates, date stamps, and project codes at the moment of capture. That third step is the hardest, because it requires behaviour change from field staff — the engineers photographing a flyover pillar in Dwarka or the heritage surveyors working a haveli lane off Ballimaran.

Departments that have already moved are finding tangible results. The process is iterative and unglamorous, but the storage savings materialise within the first billing cycle after a deduplication run. For agencies managing Delhi's rapidly growing infrastructure and heritage documentation simultaneously, that is not a minor footnote — it is a line item that can be redirected toward actual digitisation work rather than paying to store the same photograph twice.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Delhi

This article was produced by the The Daily Delhi editorial desk and covers news in Delhi. See our editorial standards for how we use AI.

The Daily Delhi brief

The day's Delhi news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Delhi news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Delhi

More in News

Enjoyed this story? Get tomorrow's briefing free.