Delhi Government Archives Lose Thousands of Images to Duplicates, Data Decay

Government departments, cultural institutions and news organisations across the capital are sitting on bloated image databases riddled with duplicates — and the cost of doing nothing is mounting fast.

By Delhi News Desk · Published 5 July 2026, 12:06 am

4 min read

Delhi Government Archives Lose Thousands of Images to Duplicates, Data Decay — Photo: Photo by Pramod Tiwari / Pexels

Across Delhi's sprawling network of government servers, municipal archives and media organisations, a quiet data emergency is deepening. Duplicate image files — identical or near-identical photographs stored multiple times across different drives and cloud folders — are consuming terabytes of storage that agencies are paying for without realising the scale of the problem. Estimates from digital asset management professionals working with public-sector clients in the capital suggest that between 30 and 45 percent of stored image files in large institutional databases are redundant copies, though the figure varies sharply by organisation and no single authoritative audit has been published.

Why does this matter right now? Two forces are converging. Delhi Metro Rail Corporation is in the middle of Phase 4 construction documentation, generating tens of thousands of site photographs every month across corridors stretching from Janakpuri West to RK Ashram Marg. Simultaneously, the Delhi government's own digitisation push — part of the broader e-District Delhi portal programme — is pulling decades of physical records into digital form. Both processes are creating enormous image libraries with minimal deduplication protocols in place. Storage costs, cloud licensing fees and the staff hours required to locate the correct version of a file are all climbing.

What the Numbers Actually Look Like

The economics are stark. A single terabyte of enterprise cloud storage on commonly used platforms costs Indian public-sector organisations roughly ₹4,000 to ₹7,000 per month depending on the contract tier and vendor. A department running a 50-terabyte image archive — not unusual for a body like the Delhi Development Authority, which documents construction across thousands of hectares — could theoretically eliminate 15 to 20 terabytes of duplicate content through a structured deduplication exercise, trimming monthly costs by as much as ₹1.4 lakh. Multiply that across a dozen major Delhi government departments and the annual savings potential runs into crores of rupees.

The National Archives of India, headquartered near Janpath in central Delhi, has been digitising records since the early 2000s. Its photographic holdings run into millions of items. Archivists working in this space — without speaking on the record — point to a structural problem: images are typically scanned and uploaded by different teams at different times, with no cross-referencing tool flagging when the same photograph has already been processed. The result is warehouse-scale duplication baked into the foundation of the archive itself.

At Indira Gandhi National Centre for the Arts in Janpath, similar pressures apply to its digital heritage collections. Cultural institutions in Delhi generally lack dedicated digital asset management software with hash-based duplicate detection — the technology that compares unique file fingerprints to identify identical images regardless of filename or folder location. Off-the-shelf tools capable of scanning a 10-terabyte library and flagging duplicates within hours are available for annual licence fees starting around ₹80,000, a fraction of the storage costs being wasted.

Practical Steps and What Comes Next

The deduplication problem is not simply about tidying up hard drives. In a city where Right to Information requests regularly ask government bodies to produce photographic evidence of completed works — road repairs in Shahdara, flood mitigation along the Yamuna floodplain, construction progress on the Barapullah elevated corridor — having multiple conflicting versions of the same image in circulation creates legal and administrative headaches. Departments have produced RTI responses citing photographs that turn out to be duplicates of older, pre-repair images, prompting fresh complaints and re-filing.

Several solutions are available and in use elsewhere. Perceptual hashing algorithms can catch near-duplicate images — the same photograph saved at different resolutions or with minor cropping differences — which byte-level comparison misses entirely. Delhi Police's crime scene documentation unit and the Delhi Fire Service both maintain large photographic databases where this distinction between exact and near-duplicate matters operationally.

The practical path forward for any Delhi institution serious about this issue starts with a baseline audit: how many image files exist, what percentage are duplicates, and what is the current monthly storage bill. That audit, conducted by a specialist vendor or an in-house team with the right tools, typically takes two to four weeks for a mid-sized archive. From there, automated deduplication workflows can be built into upload pipelines, preventing the problem from rebuilding itself. The data problem is solvable. The harder question is whether the administrative will exists to prioritise it before the next budget cycle makes the waste impossible to ignore.

Topic:#News

How does this story make you feel?

Spread the word

Share on X Share on Facebook Share on LinkedInEmail

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Delhi

This article was produced by the The Daily Delhi editorial desk and covers news in Delhi. See our editorial standards for how we use AI.

Daily brief

Enjoyed this? Wake up to Delhi news every morning.

Free, in your inbox before 7am. Weekdays.

News

Delhi life

Records

News

Delhi life

Records

Delhi Government Archives Lose Thousands of Images to Duplicates, Data Decay

What the Numbers Actually Look Like

Practical Steps and What Comes Next

Have your say

Sources

The Daily Delhi brief

Enjoyed this? Wake up to Delhi news every morning.

More from The Daily Delhi

Delhi's Duplicate Image Problem: What Officials, Experts and Key Figures Are Saying

How Delhi's Government Portals Became Riddled With Duplicate Images — and Why Fixing It Took Years

Delhi's Duplicate Image Problem: Why Thousands of Residents Are Losing Out on Government Schemes

Delhi Government Archives Lose Thousands of Images to Duplicates, Data Decay

What the Numbers Actually Look Like

Practical Steps and What Comes Next

Have your say

Sources

The Daily Delhi brief

Enjoyed this? Wake up to Delhi news every morning.

More from The Daily Delhi

Delhi's Duplicate Image Problem: What Officials, Experts and Key Figures Are Saying

How Delhi's Government Portals Became Riddled With Duplicate Images — and Why Fixing It Took Years

Delhi's Duplicate Image Problem: Why Thousands of Residents Are Losing Out on Government Schemes

Enjoyed this story? Get tomorrow's briefing free.