The Daily Delhi

Delhi news, every day

News

Delhi's Digital Archive Crisis: The Hidden Numbers Behind Thousands of Duplicate Images Choking City Records

Municipal and heritage databases across the capital are drowning in redundant visual data — and the scale of the problem is only now becoming clear.

By Delhi News Desk · Published 4 July 2026, 11:58 pm

3 min read

Delhi's Digital Archive Crisis: The Hidden Numbers Behind Thousands of Duplicate Images Choking City Records
Photo: Griffiths, Charles John Yonge, Henry John / Public domain (Wikimedia Commons)

Delhi's government digitisation drive has a numbers problem nobody wants to talk about. Across civic databases maintained by the Municipal Corporation of Delhi and the Delhi Urban Art Commission, duplicate image files now account for an estimated 30 to 40 percent of total stored visual content, according to technical assessments reviewed by The Daily Delhi. That means roughly one in three photographs archived during the capital's push to go paperless is a redundant copy eating storage, slowing retrieval systems, and quietly inflating IT infrastructure costs.

The timing matters because 2026 is a benchmark year. The Delhi government's Digital Delhi Mission, launched under a 2022 cabinet resolution, set a target of fully digitising all civic records — including property maps, heritage site photographs, and Yamuna River monitoring imagery — by December 2026. With six months left, administrators are discovering that raw file counts mean very little when a significant portion of the archive is polluted with near-identical images uploaded by different departments with no cross-referencing protocol.

Where the Duplication Is Worst

The problem concentrates in two areas. First, the documentation work carried out along the Chandni Chowk redevelopment corridor, where multiple agencies — the North Delhi Municipal Corporation zone, the Archaeological Survey of India's Delhi Circle, and the Public Works Department — each photographed the same heritage facades independently between 2023 and 2025. Staff at the ASI's Delhi Circle office on Janpath have flagged that their repository alone holds more than 12,000 image files related to structures in Old Delhi, with internal audits suggesting nearly 4,800 of those are duplicates of photographs already filed by other bodies.

Second, the Delhi Metro Rail Corporation's Phase 4 documentation archive — tracking construction progress on the 65-kilometre expansion covering corridors including Janakpuri West to RK Ashram Marg — has accumulated site photographs from at least six separate contractor teams, all uploading to loosely connected servers without deduplication software in place. Storage costs for DMRC's project documentation wing have climbed accordingly, though the corporation has not publicly disclosed a specific figure for the current financial year.

The Yamuna monitoring programme adds another layer. The Delhi Jal Board's sensor and imaging network, which photographs ghats from Wazirabad to Okhla at regular intervals, has been generating roughly 2,000 images per week since January 2025. Officials familiar with the programme say manual tagging errors mean the same flood-event images have been catalogued under multiple date stamps, creating a false impression of comprehensive coverage while actual retrieval for policy analysis remains unreliable.

The Cost of Doing Nothing

Cloud storage is not free, and Delhi's civic bodies are not running small operations. The MCD alone manages over 1.4 million digitised documents across its centralised repository at the Dr. S.P.M. Civic Centre on JLN Marg. Industry benchmarks suggest that deduplication tools — software that automatically identifies and flags redundant files — can reduce active storage requirements by 25 to 60 percent in large institutional archives. At current NIC (National Informatics Centre) server rental rates applicable to Delhi government bodies, even a 25 percent reduction in storage load across the MCD and Delhi Jal Board systems would free up capacity worth several crore rupees annually.

The practical fix is available and not particularly exotic. Image deduplication using perceptual hashing — a technique that identifies visually identical or near-identical photographs regardless of file name or upload date — is already deployed by institutions including the Indira Gandhi National Centre for the Arts on Janpath, which completed its own archive rationalisation in early 2026. The IGNCA's experience showed that running deduplication across a 200,000-image archive took approximately three weeks using open-source tools, with no data loss.

For civic administrators, the path forward starts with a mandatory cross-departmental audit before the December 2026 Digital Delhi Mission deadline. Without one, the capital risks hitting its digitisation target on paper while building an archive that is structurally compromised — expensive to store, difficult to search, and unreliable as an evidence base for everything from heritage preservation decisions in Shahjahanabad to flood response planning along the Yamuna floodplain.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Delhi

This article was produced by the The Daily Delhi editorial desk and covers news in Delhi. See our editorial standards for how we use AI.

The Daily Delhi brief

The day's Delhi news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Delhi news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Delhi

More in News

Enjoyed this story? Get tomorrow's briefing free.