The Delhi Urban Heritage Foundation confirmed this week that its digital repository, which houses over 1.4 lakh scanned images of Old Delhi structures, had flagged more than 12,000 duplicate entries — some photographs appearing as many as seven times under different catalogue numbers. The revelation, surfacing just days before a July 4 internal audit deadline, has forced a temporary freeze on new uploads across at least three partner institutions.
The timing is pointed. Delhi is mid-way through a broader push to digitise civic and heritage records, a process that accelerated after the Archaeological Survey of India and the Delhi government's Department of Art, Literature and Heritage jointly committed in early 2025 to a fully searchable public archive by the end of 2026. Duplicate image replacement — the technical process of identifying redundant files, verifying originals, and substituting clean master copies — sits at the heart of that project. A backlog now threatens the entire timeline.
Where the Problem Is Sharpest
Ground zero is the Shahjahanabad documentation project, a digitisation drive covering the walled city from Chandni Chowk to the Turkman Gate neighbourhood. Volunteers and contracted photographers have been working since March 2025, producing a high volume of overlapping material. The Aga Khan Trust for Culture, which has collaborated with the Delhi government on conservation work in the walled city for years, is understood to be among the institutions whose contributed image batches require deduplication before they can be merged into the central repository.
The Indira Gandhi National Centre for the Arts in Janpath, which maintains its own vast image database of Indian cultural sites, faced a parallel issue in May when a bulk import from a partner university introduced roughly 3,400 redundant files into its South Asian Visual Archive. Staff there have been running automated hash-matching software to identify pixel-identical files, but unique duplicates — photographs taken from marginally different angles of the same subject — require manual review, a slower and costlier process.
Across both institutions, the practical consequence is the same: researchers trying to access records of specific structures in Nizamuddin, Mehrauli, or along the Yamuna riverfront are encountering broken links, mismatched metadata, and in some cases images captioned with the wrong building name entirely.
What the Data Shows — and What Comes Next
The scale of the issue is not unique to Delhi. International standards body IFLA — the International Federation of Library Associations — noted in its 2024 digitisation report that urban heritage archives in rapidly expanding cities frequently see duplicate rates of between 8 and 15 percent during bulk-upload phases. Delhi's current reported rate of roughly 8.5 percent across the Shahjahanabad project sits at the low end of that range, but with total holdings expected to cross 2 lakh images before the year is out, even a contained duplication rate translates to a significant manual workload.
The Delhi Metro Rail Corporation's own asset-documentation archive — separate from heritage cataloguing but subject to similar pressures as Phase 4 expansion stations are photographed and logged — reportedly adopted an automated deduplication protocol in late 2024 that reduced redundant file storage by roughly 30 percent within six months. Heritage digitisation teams are now looking at adapting similar open-source toolsets, including scripts built around perceptual hashing, which can catch near-identical images that byte-level matching misses.
For the July audit deadline that just lapsed, the immediate priority is clearing the backlog at the Shahjahanabad project before monsoon fieldwork resumes — surveyors are typically grounded during heavy rain in July and August, which gives archivists a narrow window to process existing material without new batches arriving. Institutions are being advised to enforce a strict one-upload-per-session rule and to run automated checks before any batch is transferred to the central repository. If the deduplication work can be completed by early September, the 2026 public-access target may still be achievable — but officials have offered no formal guarantee of that.