The Daily Delhi

Delhi news, every day

News

Delhi's Digital Archives Accumulate Millions in Duplicate Files, Wasting Taxpayer Funds

Municipal servers, heritage digitisation drives and metro documentation projects are drowning in duplicate image files — and the cost in storage, time and taxpayer money is becoming impossible to ignore.

By Delhi News Desk · Published 5 July 2026, 12:46 am

3 min read

Delhi's Digital Archives Accumulate Millions in Duplicate Files, Wasting Taxpayer Funds
Photo: Pidgeon, Daniel / Public domain (Wikimedia Commons)

Delhi's government-linked digital repositories collectively hold an estimated several million image files, and a growing share of them are exact or near-exact copies of the same photographs. The problem is not abstract. Storage costs real money, retrieval takes real time, and in a city running parallel digitisation campaigns across heritage conservation, urban planning and public health documentation, the duplication is compounding at pace.

The timing matters because three major archival pushes are running simultaneously right now. The Archaeological Survey of India's Delhi Circle has been digitising monuments across the Mehrauli Archaeological Park. The Delhi Metro Rail Corporation is documenting Phase 4 construction progress — covering corridors from Janakpuri West to R.K. Ashram Marg — generating thousands of site photographs weekly. And the Delhi Urban Heritage Foundation has been cataloguing structures across Shahjahanabad, the walled city area of Old Delhi, as part of a mapping initiative tied to the Municipal Corporation of Delhi. All three generate images. All three, according to public procurement documents reviewed as part of budget discussions for the 2025-26 civic IT cycle, have identified redundant file accumulation as a known operational problem.

The Storage Arithmetic Nobody Wants to Do

Cloud and on-premises storage is not free. Government-grade storage procurement in India — sourced typically through National Informatics Centre empanelled vendors — runs roughly between ₹3 and ₹8 per gigabyte per month depending on redundancy tier and contract volume, based on published NIC rate cards. A single high-resolution site photograph from a DSLR camera used by heritage or infrastructure teams commonly runs between 20 and 40 megabytes. If even 30 percent of a 500,000-image archive is duplicated, that is 150,000 files potentially consuming 4.5 terabytes of avoidable storage. At mid-range government rates, that excess costs roughly ₹1.08 lakh per month — more than ₹13 lakh a year — on a single medium-sized archive alone. Scale that across the dozen or more departments maintaining photo records in the National Capital Territory, and the figure climbs steeply.

The duplication problem is not unique to photography. The Delhi Secretariat's document management systems, which handle records for departments housed in the I.P. Estate complex near ITO, have faced similar redundancy issues with scanned PDFs. But image files are particularly costly because their file sizes are large and because automated deduplication tools — which use perceptual hashing algorithms to identify visually identical or near-identical photographs — are still not standard-issue in most state government IT stacks in India.

What Deduplication Actually Involves — and What Delhi Is Starting to Do

Perceptual hash-based deduplication works differently from simple checksum matching. A checksum flags only files that are byte-for-byte identical. A perceptual hash — tools like pHash or dHash are widely used open-source options — compares visual content, catching the same image saved at different resolutions, with different filenames, or with minor colour corrections applied. For archival work in heritage documentation, where the same photograph of a Chandni Chowk haveli might be uploaded by three different field surveyors from the same afternoon visit, this distinction matters enormously.

The Delhi Metro Rail Corporation's IT wing put out a request for a records management system upgrade in late 2025, with deduplication listed among the functional requirements in procurement notices published on the Central Public Procurement Portal. The Municipal Corporation of Delhi's digitisation cell, operating out of offices near Dr. S.P. Mukherjee Civic Centre in Connaught Place, has reportedly been piloting a storage audit tool, though no formal public outcome report has been released yet.

For departments still waiting on centralised solutions, the practical path forward involves three steps: run a perceptual hash scan on existing archives before migrating to any new system, establish a single-upload protocol so field teams submit images to one repository rather than emailing batches to multiple supervisors, and set quarterly deduplication checks as a procurement condition in any new vendor contract. The mathematics are not complicated. The will to act on them has simply, so far, been slow to follow.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Delhi

This article was produced by the The Daily Delhi editorial desk and covers news in Delhi. See our editorial standards for how we use AI.

The Daily Delhi brief

The day's Delhi news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Delhi news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Delhi

More in News

Enjoyed this story? Get tomorrow's briefing free.