Delhi's public digital repositories are carrying tens of thousands of redundant image files — identical or near-identical scans stored multiple times across separate servers — and no single city agency has yet deployed a systematic deduplication programme to clear them. The problem spans at least three major institutional archives, including the Delhi State Archives on Shyamji Das Marg in Civil Lines and the digital holdings managed under the Archaeological Survey of India's Delhi Circle office near India Gate.
The issue is not cosmetic. Duplicate images inflate storage costs, slow retrieval times for researchers and journalists, and — in at least one documented case involving the Yamuna Ghat photographic collection — have caused conflicting version histories that make it impossible to establish which scan is the authoritative one. With the Delhi Metro Rail Corporation also digitising engineering drawings as part of its Phase 4 expansion documentation, the volume of institutional image data in the capital is growing faster than the infrastructure managing it.
What Cities Like London and Seoul Did Differently
London's Victoria and Albert Museum completed a rolling deduplication audit of its digital image library in 2023, using perceptual hashing — a technique that identifies visually similar images even when file names or metadata differ — to cut redundant files by roughly 30 percent across its public-facing collections. Seoul's National Museum of Korea introduced automated duplicate-flagging into its ingest pipeline as early as 2021, meaning duplicates are caught before they enter the permanent record rather than after.
Delhi has no equivalent checkpoint. The National Informatics Centre, which manages digital infrastructure for multiple Union and state government bodies including Delhi government portals, has published guidelines on data governance but has not, as of this writing, rolled out a city-level image deduplication standard applicable to heritage and civic archives. The Smart Cities Mission, under which Delhi received central funding allocations in previous budget cycles, earmarked resources for digitisation but the programme documents reviewed by this reporter do not specify deduplication as a deliverable.
Mumbai's Brihanmumbai Municipal Corporation began integrating hash-based duplicate detection into its property records image system in late 2024, a project worth approximately ₹4.2 crore according to a tender notice published on the BMC procurement portal. Kolkata's state library network adopted open-source deduplication tools for its photographic negatives project in 2022. Delhi, by contrast, still relies largely on manual checks carried out by archival staff at individual institutions.
The Local Stakes in Chandni Chowk and Beyond
The practical consequences are visible in places where heritage documentation matters most. The Shahjahanabad Redevelopment Corporation, responsible for Old Delhi including the Chandni Chowk corridor and the areas around Jama Masjid, has been building a photographic record of pre-renovation streetscapes since the Chandni Chowk redevelopment project concluded in 2021. Researchers and urban planners who have accessed that archive have flagged that multiple scans of the same elevation drawing appear under different file identifiers, creating version confusion that delays approvals.
The Delhi Heritage Conservation Committee, which advises the Delhi government on protected structures, faces a related problem when pulling imagery from the ASI database to cross-reference with its own records. Staff have to manually reconcile files, a task that, in resource-constrained government offices, simply does not always happen before decisions are made.
Globally, the trend since around 2020 has been toward embedding deduplication at the point of ingestion — before files touch a permanent server — rather than running retrospective clean-up campaigns that require shutting down access. Singapore's National Archives completed such a transition in 2022. Delhi's institutions are still, for the most part, in the retrospective phase, if they have begun at all.
The practical path forward is relatively well-mapped. Institutions need to agree on a shared metadata standard, adopt perceptual hashing at the upload stage, and assign a nodal agency — the Delhi State Archives is the logical candidate — to coordinate across departments. None of that requires large capital expenditure. What it requires is a decision. Until that decision is made, every new digitisation drive the city launches, however well-funded, is compounding the same underlying problem.