Delhi's government digital archives contain hundreds of thousands of duplicate image files, a problem that has quietly compounded since at least 2018 and now threatens the integrity of public records held by departments ranging from the Delhi Development Authority to the Municipal Corporation of Delhi. Officials across multiple civic bodies acknowledged the problem in internal reviews circulated earlier this year, though no single agency has yet claimed ownership of the cleanup.
The issue matters now because Phase 4 of the Delhi Metro—with stations planned through Janakpuri West, Tughlakabad, and Inderlok—has forced a parallel surge in land-record digitisation. Every parcel of affected property along the 65.1-kilometre expansion corridor requires verified photographic documentation. When duplicate images populate the same database, surveyors cannot reliably confirm which photograph is current, which is superseded, and which was uploaded in error.
How Duplication Took Root
The origins trace back to 2008, when the Delhi government launched its first major push to digitise property and heritage records under the National e-Governance Plan. Departments uploaded scanned photographs independently, without a shared metadata standard or a centralised deduplication protocol. The Archaeological Survey of India's Delhi Circle, responsible for 174 protected monuments in the city, ran a parallel digitisation drive. The DDA maintained its own servers. The MCD—split into three corporations until its reunification in May 2022—had been operating three separate content management systems simultaneously, each capable of storing the same image under a different file name.
The reunified MCD inherited all three databases on a single platform but did not immediately reconcile them. According to a January 2025 report tabled before the Delhi Urban Shelter Improvement Board, roughly 34 percent of image assets migrated from the former North, South, and East Delhi Municipal Corporation servers were flagged as probable duplicates by automated hash-checking software. That figure—34 percent—has become the most-cited number in inter-departmental correspondence on the subject, though the DUSIB review noted the audit covered only JPEG and PNG formats and excluded legacy TIFF files entirely.
The Yamuna cleanup drive added another layer. Between 2020 and 2024, teams operating under the Delhi Jal Board photographed encroachments, ghats, and drain outfalls as documentary evidence for the National Green Tribunal. Many of those photographs were uploaded to both the DJB's internal portal and to a shared NGT compliance dashboard, generating a second class of duplicates that now sit across two jurisdictions with different retention rules.
Old Delhi's Heritage Layer Complicates the Fix
Nowhere is the problem more acute than in the documentation of Old Delhi's built fabric. The Shahjahanabad Redevelopment Corporation has been compiling photographic surveys of havelis, mosques, and stepwells in Chandni Chowk and Ballimaran since 2016. Surveyors working under successive project cycles often re-photographed the same structures without cross-checking earlier uploads. A 2024 internal audit by the corporation found that certain buildings in the Lal Kuan area had been photographed and filed under as many as seven distinct record entries, each assigned to a different survey cycle.
The practical consequence is not merely bureaucratic untidiness. When a structure's condition is disputed—in litigation before the Delhi High Court, for example, or in compensation hearings at the land acquisition collector's office in Tis Hazari—a multiplicity of dated images with conflicting metadata becomes a evidentiary problem that slows proceedings and increases costs for residents.
Several civic technology organisations working in the capital, including those partnered with the Delhi government's e-District portal on Vikas Bhavan, have been piloting AI-assisted deduplication tools since late 2024. The approach involves perceptual hashing combined with geolocation tagging to cluster images of the same physical location regardless of file name or upload date. Early pilots in two east Delhi revenue circles produced results within six weeks, though scaling the method across all 272 revenue circles in the National Capital Territory will require both budget allocation and inter-departmental data-sharing agreements that do not yet exist.
The Delhi government's IT department is expected to present a unified digital asset management framework to the cabinet before the end of the current financial year—March 2027 at the latest. Until then, surveyors, lawyers, and heritage conservationists working with official image records should cross-reference upload dates, GPS metadata, and originating department codes before treating any single file as authoritative. The archive, as it stands, reflects not one city but several overlapping versions of it.