The Daily Delhi

Delhi news, every day

News

Delhi's Digital Archives Are Drowning in Duplicate Images — And This Week, Officials Finally Moved to Fix It

A long-running data quality crisis inside Delhi's municipal and heritage digitisation projects came to a head this week, as agencies began a systematic purge of redundant image files clogging government servers.

By Delhi News Desk · Published 5 July 2026, 12:30 am

3 min read

Delhi's Digital Archives Are Drowning in Duplicate Images — And This Week, Officials Finally Moved to Fix It
Photo: Photo by Shantum Singh on Pexels

Delhi's sprawling network of civic digitisation drives — from the Municipal Corporation of Delhi's property mapping database to the Delhi Urban Heritage Foundation's archive of Old Delhi monuments — has been quietly accumulating a serious problem for years. This week, officials across at least three government departments initiated coordinated deduplication exercises to identify and remove hundreds of thousands of duplicate image files from public-record systems, a push that administrators say has become urgent as storage costs and data retrieval failures mounted through the first half of 2026.

The trigger was a technical audit completed in late June by the Delhi State Data Centre, the facility housed at Pushp Vihar in South Delhi that underpins a large share of the capital's digital government infrastructure. The audit found that duplicate image records were inflating database sizes by a significant margin and slowing query response times on systems used for citizen services, including property verification and heritage documentation requests filed under the Right to Information Act.

Where the Problem Hit Hardest

The duplication crisis is most visible in two specific initiatives. The first is the Yamuna Corridor Documentation Project, a joint effort between the Delhi Development Authority and the Delhi Jal Board to create a photographic inventory of the riverfront between the Wazirabad Barrage and the Okhla Bird Sanctuary. Fieldwork teams using multiple devices submitted images through separate upload portals over 18 months, resulting in what the audit described as layered redundancy — the same stretch of riverbank photographed on the same day appearing under different file identifiers, sometimes dozens of times.

The second pressure point is the Old Delhi Digital Heritage Archive, administered through the Shahjahanabad Redevelopment Corporation's office near Lal Qila. That project, which was set up to catalogue structures in Chandni Chowk, Ballimaran, and the lanes around Jama Masjid, ingested images from freelance photographers, student volunteers from the School of Planning and Architecture on Indraprastha Estate, and government surveyors — all without a unified deduplication protocol at point of entry. Program coordinators have not publicly disclosed the scale of the redundancy in those records, but the State Data Centre audit flagged the archive as among the top five storage consumers on the Pushp Vihar infrastructure.

What the Cleanup Involves — and What It Costs

Deduplication at this scale is not a one-click fix. Technicians are running perceptual hashing algorithms — software that compares images by visual fingerprint rather than just file name — to identify near-identical records before flagging them for human review. The process, which began formally on July 1, is expected to run through the end of August. The Delhi State Data Centre has allocated a portion of its fiscal year 2026-27 IT maintenance budget to the effort, though the specific figure has not been released in any public document reviewed for this article.

Storage on government cloud infrastructure is not cheap. Commercial cloud providers operating in the Indian market have been quoting rates for archive-tier storage in the range of ₹1.5 to ₹2.8 per GB per month for enterprise clients, which gives some sense of why a ballooned archive represents a recurring line-item problem, not just a technical inconvenience. For systems where the image count runs into the millions, even a 30 percent reduction in stored data translates to material annual savings.

The broader relevance extends beyond IT budgets. Delhi Metro Rail Corporation's Phase 4 corridor documentation — which is photographing construction progress at sites including Janakpuri West and Krishna Park Extension — uses some of the same state infrastructure. Administrators want deduplication protocols embedded in upload workflows before Phase 4 image volumes peak later this year.

For residents and researchers who rely on public heritage archives or property records, the practical advice is straightforward: RTI requests for photographic documentation filed before late August may experience slower-than-usual turnaround while deduplication is active. Requests filed after September 1 should encounter a leaner, faster system — assuming the cleanup proceeds on schedule. Anyone with pending heritage documentation needs tied to the Shahjahanabad Redevelopment Corporation is advised to contact the office on Netaji Subhash Marg directly to confirm whether their case files are in an affected database partition.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Delhi

This article was produced by the The Daily Delhi editorial desk and covers news in Delhi. See our editorial standards for how we use AI.

The Daily Delhi brief

The day's Delhi news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Delhi news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Delhi

More in News

Enjoyed this story? Get tomorrow's briefing free.