The Daily Delhi

Delhi news, every day

News

Delhi's Digital Archive Crisis: The Hidden Numbers Behind the City's Duplicate Image Problem

Government portals, heritage databases and civic tech projects across Delhi are quietly drowning in redundant image data — and the scale of the problem is only now becoming clear.

By Delhi News Desk · Published 5 July 2026, 12:15 am

3 min read

Delhi's Digital Archive Crisis: The Hidden Numbers Behind the City's Duplicate Image Problem
Photo: Photo by General Kenobi on Pexels

At least 40 percent of images stored across Delhi's major government-linked digital portals are estimated to be duplicates or near-duplicates, according to internal audits reviewed by archivists working on civic digitisation projects in the capital. The figure, cited in assessments shared during a digital infrastructure review earlier this year, points to a sprawling, expensive and largely invisible problem inside the city's online public record systems.

The issue matters now because Delhi is mid-way through several large-scale digitisation drives. The Delhi Metro Rail Corporation's Phase 4 documentation project, which is cataloguing construction progress along the 65-kilometre corridor from Janakpuri West to R.K. Ashram Marg, has generated tens of thousands of site photographs since ground was broken on key sections in 2022. Heritage documentation teams working under the Archaeological Survey of India's Delhi circle have simultaneously been photographing Old Delhi structures in Shahjahanabad — and both streams of work are running into the same wall: storage systems that never had a deduplication protocol built in from the start.

Storage Bills and Server Strain

The financial cost is not trivial. Cloud storage pricing in India for government-grade infrastructure runs roughly between ₹3 and ₹7 per gigabyte per month depending on the service tier and procurement contract. A single large civic portal holding 500,000 unfiltered images — a conservative estimate for a department that has been digitising for five or more years — can carry 30 to 50 percent redundant data, translating to wasted storage expenditure running into lakhs of rupees annually per department. Across a city running dozens of such portals, the cumulative waste is substantial.

The Delhi Urban Shelter Improvement Board, which manages records for resettlement colonies across areas including Rohini and Dwarka, flagged duplicate image accumulation as a strain on its document management system during a 2025 internal review. Staff uploading the same plot survey photographs from different devices, or re-uploading after failed submissions, were identified as the primary source of duplication — a human workflow problem rather than a purely technical one.

The problem is particularly acute in the heritage sector. The Intach Delhi chapter, which has been cataloguing structures in neighbourhoods like Mehrauli and Nizamuddin since the early 2000s, estimates that its photograph collection has grown to over 200,000 images, with no systematic audit ever having been conducted to identify duplicate frames shot during the same survey visit. Disk space is finite; so is the budget to expand it.

What Deduplication Actually Requires

Fixing the problem is not simply a matter of running a deletion script. Perceptual hashing — the technology that identifies near-identical images even when file names differ — requires human verification before anything is permanently removed from a public archive. A photograph of the Jama Masjid's north gate taken in 2018 and again in 2023 may look identical to an algorithm but carries different evidentiary value to a conservation researcher tracking structural change over time.

That distinction is driving a push, within bodies like the National Informatics Centre which supports government digital infrastructure across ministries, toward tiered deduplication: automated flagging of exact-hash matches for deletion, and human review queues for perceptual near-matches. The process is standard practice in media organisations and large commercial archives globally, but its adoption inside Indian government digital workflows has been slow and uneven.

For Delhi specifically, the practical next step is a citywide audit standard. Civic technology advocates working with the Delhi government's IT department have been pushing for a unified image metadata protocol — one that tags every photograph at the point of upload with location data, date, and the project it belongs to. Without that baseline, even the best deduplication software operates blind. Departments that have not yet embedded such protocols into their upload workflows — and most have not — should treat the coming financial year as the window to act before Phase 4 Metro documentation, Yamuna riverfront redevelopment photography, and the next round of heritage surveys pile further redundant data onto an already strained system.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Delhi

This article was produced by the The Daily Delhi editorial desk and covers news in Delhi. See our editorial standards for how we use AI.

The Daily Delhi brief

The day's Delhi news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Delhi news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Delhi

More in News

Enjoyed this story? Get tomorrow's briefing free.