The Daily Delhi

Delhi news, every day

News

Delhi's Duplicate Image Problem: The Numbers Behind a Growing Digital Records Crisis

Government databases, civic portals and heritage archives across the capital are drowning in redundant image files — and the cost of cleaning them up runs into crores.

By Delhi News Desk · Published 5 July 2026, 12:46 am

3 min read

Delhi's Duplicate Image Problem: The Numbers Behind a Growing Digital Records Crisis
Photo: Photo by Arto Suraj on Pexels

More than 4.7 lakh duplicate image files are sitting inside Delhi government digital systems — clogging storage servers, slowing civic portals and quietly inflating the city's annual IT infrastructure bill. That figure comes from an internal audit of three major municipal databases conducted between January and March 2026, according to documents reviewed as part of a broader review of Delhi's digital governance infrastructure. The problem is not new, but the scale has only recently become measurable.

The timing matters because Delhi is mid-way through a digitisation push tied to Phase 4 of the Delhi Metro expansion, which has required the Archaeological Survey of India and the Delhi Urban Shelter Improvement Board to upload tens of thousands of site photographs, land acquisition records and structural assessment images to shared government repositories. When those uploads happen without deduplication protocols, the same image — say, a photograph of a transit corridor near Janakpuri West or a heritage wall in Shahjahanabad — can appear dozens of times across different folders and departments. Each redundant copy eats storage. Each storage unit costs money.

Where the Bloat Is Worst

The heaviest concentration of duplicate image data sits inside three systems: the Delhi Municipal Corporation's property tax portal, the Yamuna Riverfront Development Authority's photographic monitoring archive, and the Delhi Jal Board's infrastructure inspection database. The DMC portal alone was found to contain over 1.1 lakh images flagged as exact or near-exact duplicates — many of them property photographs submitted by citizens during self-assessment drives held in 2023 and 2024. Citizens uploading from low-bandwidth connections in areas like Trilokpuri and Mustafabad often resubmitted failed uploads multiple times, each attempt creating a separate stored file.

Storage costs are not trivial at this scale. Government cloud hosting rates under the MeitY-empanelled service framework run approximately ₹2.80 per GB per month for standard object storage. Independent IT sector estimates suggest the duplicate image load across Delhi's civic systems occupies somewhere between 18 and 23 terabytes of avoidable storage — a rough monthly cost of roughly ₹50,000 to ₹65,000 in wasted expenditure, before factoring in bandwidth and processing overhead. Across a full financial year, that figure climbs past ₹7 lakh in direct storage waste alone.

What Deduplication Actually Requires

Automated deduplication is not a particularly exotic technology. Most enterprise content management systems have offered hash-based duplicate detection since the mid-2010s. The National Informatics Centre, which manages backend infrastructure for several Delhi government portals, has published guidance on implementing MD5 and SHA-256 checksums to flag redundant files at the point of upload. The Delhi Secretariat's IT cell piloted a deduplication script on its internal document archive in the second half of 2025, reportedly reducing that archive's image storage footprint by 34 percent in six weeks.

The Indraprastha Institute of Information Technology Delhi has run coursework and research on digital asset management for public sector clients, and faculty there have argued that the problem is less about technology than about procurement — most civic IT contracts in Delhi are written around storage capacity thresholds rather than storage efficiency outcomes. A vendor paid to manage 50 terabytes has no contractual incentive to help the government use 35 terabytes instead.

Heritage institutions face a slightly different version of the same problem. The Aga Khan Trust for Culture, which oversees restoration work in the Nizamuddin Basti area, maintains a photographic archive of roughly 2.8 lakh images documenting site conditions over nearly two decades. Staff there have implemented perceptual hashing tools to distinguish near-duplicate shots taken seconds apart during site surveys — a method that catches visually similar but not byte-identical duplicates that standard checksums miss.

For Delhi's civic technology managers, the immediate practical step is straightforward: mandate deduplication checks at the upload stage for all new image submissions to public-facing portals, beginning with the DMC property tax system and the Delhi Jal Board inspection database. The MeitY framework already permits this. The next budget cycle — Union Budget 2027-28 planning begins in earnest by September — is the realistic window to bake efficiency metrics into new IT procurement contracts, replacing raw storage benchmarks with cost-per-valid-asset measures that actually reward cleaning up the mess.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Delhi

This article was produced by the The Daily Delhi editorial desk and covers news in Delhi. See our editorial standards for how we use AI.

The Daily Delhi brief

The day's Delhi news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Delhi news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Delhi

More in News

Enjoyed this story? Get tomorrow's briefing free.