Thousands of official photographs, scanned heritage documents and civic identity records stored by Delhi government agencies contain exact or near-exact duplicate image files, a problem that has quietly compounded across multiple digitisation drives over the past decade and is now forcing an emergency review of storage infrastructure across at least three major departments.
The issue matters now because Phase 4 of the Delhi Metro expansion — which requires continuous, verified photographic documentation of land acquisition, structural surveys and heritage-impact assessments — has exposed how badly the underlying image management systems are broken. Project files submitted to the Delhi Metro Rail Corporation's documentation wing have repeatedly contained redundant scans, pushing individual record packages to sizes that make them functionally unworkable for auditors and legal review teams.
A Decade of Shortcuts Coming Due
The roots of the problem stretch back to at least 2015, when the Delhi government launched its first large-scale push to digitise land and municipal records under the Digital India programme. Field offices in areas including Chandni Chowk, Karol Bagh and the Old Delhi revenue circle were issued scanning equipment and instructed to upload documents to centralised servers — but without any deduplication protocol in place. Workers scanning multi-page files often re-scanned individual pages as standalone images rather than correcting errors, meaning a single four-page property document could generate upward of a dozen stored image files, many of them identical.
The Delhi Archives, based near the Civil Lines neighbourhood in north Delhi, flagged the problem internally as early as 2019, when its own digitisation team found that roughly a third of its scanned photographic holdings — spanning colonial-era maps, municipal photographs and Partition-period records — contained redundant files. The Archive's storage burden grew substantially as a result, but funding for a systematic cleanup was not allocated at that time.
The situation worsened after the COVID-19 pandemic, when rushed digitisation of health records, vaccination certificates and urban ward documents added new layers of duplicated content to government servers. Multiple entry points — district offices in Dwarka, Rohini and Lajpat Nagar, as well as temporary scanning centres set up under the Aam Aadmi Party government's Mohalla Clinic network — uploaded files independently, often without cross-referencing what already existed in the central repository.
What the Numbers Actually Show
According to a review process initiated by the Department of Information Technology under the Delhi government in early 2026, preliminary audits of two sample databases — covering municipal property records and public works documentation — found that duplicate or near-duplicate images accounted for a substantial share of total stored files, increasing overall storage costs and slowing retrieval times across the board. The IT Department is expected to present full findings to the Delhi Cabinet by September 2026, though the review timeline has already slipped once from an original June deadline.
The DMRC, for its part, is understood to have contracted a third-party firm to build a deduplication layer into its documentation pipeline specifically for Phase 4 corridors, including the planned Janakpuri West to Krishna Park Extension section. That contract, awarded in the first quarter of 2026, represents a recognition that the problem cannot be solved through manual review alone.
The Yamuna Rejuvenation project's documentation archive — maintained jointly by the Delhi Jal Board and the National Mission for Clean Ganga — has also been cited in internal communications as a system where photographic monitoring records of riverbank sites between Wazirabad and Okhla have been duplicated across multiple upload sessions, making it harder to establish accurate timelines of construction and remediation work.
Officials managing digital records in Delhi's heritage zones, particularly around the Shahjahanabad precinct and the Archaeological Survey of India-managed buffer areas in Mehrauli, are now being instructed to run validation checks before any new uploads, a procedural change that has been in standing orders since at least 2021 but was rarely enforced. The practical question going forward is whether the September review produces a funded, mandatory deduplication standard — or another report that sits on a shelf while the servers keep filling up.