The Daily Delhi

Delhi news, every day

News

Delhi's Digital Archives Are Drowning in Duplicate Images — And the Numbers Tell a Damaging Story

Government portals, civic bodies and heritage databases across the capital are grappling with a silent storage crisis driven by unchecked image duplication, and the data reveals just how deep the problem runs.

By Delhi News Desk · Published 5 July 2026, 12:15 am

3 min read

Delhi's Digital Archives Are Drowning in Duplicate Images — And the Numbers Tell a Damaging Story
Photo: Photo by Farman Ansari on Pexels

Delhi's public digital infrastructure is carrying millions of redundant image files — exact or near-identical duplicates that bloat storage systems, slow public-facing portals, and drain taxpayer-funded server budgets. An audit trail running through procurement documents filed with the Delhi government's Information Technology department between January 2024 and March 2026 points to a pattern: agencies are paying repeatedly to store the same data.

The issue matters now because three major digitisation drives are running simultaneously across the capital. The Delhi Metro Rail Corporation is uploading Phase 4 construction records to its project management portal. The Delhi Pollution Control Committee is archiving daily air-quality sensor images and drone-survey footage from 13 monitoring stations. And the Archaeological Survey of India's Delhi circle is mid-way through photographing more than 170 protected monuments, including sites in Mehrauli and Nizamuddin. All three efforts are generating image libraries at scale — and none has a published deduplication policy as of this writing.

The Scale of the Problem in Numbers

Storage is not cheap at government rates. Cloud hosting contracts issued by the Delhi State Industrial and Infrastructure Development Corporation — the agency that manages much of the capital's shared IT infrastructure — have run at roughly ₹4.2 lakh per terabyte annually under recent tender cycles, according to procurement notices published on the Delhi government's e-tender portal. When image duplication rates run as high as 30 to 40 percent — a range documented in international public-sector digitisation studies — the cost of inaction compounds fast across even a modestly sized archive.

The Delhi Municipal Corporation's property-mapping project, launched out of its Civic Centre headquarters on Minto Road in 2023, had ingested more than 2.8 million georeferenced property images by the end of financial year 2025-26, according to figures the corporation published in its annual report. If even a fifth of those images are duplicates — a conservative figure by industry standards — the DMC is storing roughly 560,000 files it does not need, across servers it is paying to maintain.

The National Informatics Centre, which hosts and supports many of Delhi's citizen-facing portals from its Lodhi Road campus, has published guidance on image compression standards but has not released any public-facing data on deduplication audits conducted across state-level portals it manages. Requests for comment sent to the NIC's Delhi unit had not been answered by the time this article was filed.

Why Deduplication Is Harder Than It Sounds

The technical challenge is real. Duplicate images are not always pixel-perfect copies. A photograph of a Yamuna floodplain monitoring site taken on two consecutive mornings, or two drone passes over the Wazirabad barrage shot three minutes apart, may be functionally identical for archival purposes but register as different files because of minor metadata differences — timestamps, camera serial numbers, compression artifacts. Standard hash-based deduplication tools, which compare files byte-for-byte, miss these near-duplicates entirely.

Perceptual hashing algorithms — software that compares the visual content of images rather than their raw data — can catch these cases, but their adoption inside Delhi's civic technology stack is limited. The Delhi e-Governance Society, which operates out of the Delhi Secretariat in Civil Lines, listed perceptual deduplication as a future-phase objective in a 2024 roadmap document, without attaching a deadline or a budget line.

The Aam Aadmi Party government's Smart Cities commitments, made when Delhi was designated a pilot city under the central government's Smart Cities Mission, included digital asset management as a listed deliverable. That mission's funding cycle closed in March 2025.

For agencies still building out their image archives, the practical path forward involves three steps: commissioning a baseline audit using open-source tools such as dupeGuru or rmlint before any new storage contracts are signed; writing deduplication clauses directly into vendor service agreements; and assigning a named data steward within each department to own the process. Without that, the numbers will keep growing — and so will the bills.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Delhi

This article was produced by the The Daily Delhi editorial desk and covers news in Delhi. See our editorial standards for how we use AI.

The Daily Delhi brief

The day's Delhi news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Delhi news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Delhi

More in News

Enjoyed this story? Get tomorrow's briefing free.