Delhi's public documentation infrastructure has a storage problem that has been quietly compounding since at least 2014. Across multiple municipal and state-level systems — from the Delhi Development Authority's land records portal to the digitised files held by the South Delhi Municipal Corporation — tens of thousands of scanned images were uploaded, catalogued, and then uploaded again. The result is a sprawling archive riddled with duplicates that inflate storage costs, slow retrieval times, and in several documented cases have contributed to conflicting records being served to citizens filing property or identity claims.
The issue matters right now because the Kejriwal government and the Delhi government's IT department are midway through a Phase 2 digitisation push tied to the broader Delhi e-Governance Society framework, a programme that has been operational since 2019. Pouring new records into a system that has never been systematically deduplicated risks locking in the problem at greater scale. The Union government's Ministry of Electronics and Information Technology, which provides technical standards for state e-governance projects under the Digital India umbrella, has flagged image deduplication as a compliance requirement since 2021 — but enforcement at the state level has been inconsistent.
How the Backlog Built Up
The roots of the problem go back to two separate digitisation waves. The first ran roughly between 2012 and 2016, when physical records held at offices in Kashmere Gate, the Civic Centre on Minto Road, and the DDA's Vikas Sadan complex in Green Park were scanned in bulk by contracted agencies. Standards varied by contract. Some vendors scanned double-sided pages as single images; others split them. When files were migrated to newer servers between 2017 and 2019, many documents were simply re-ingested without any deduplication check, because the contracts for migration work did not explicitly require one.
The second wave hit when the COVID-19 lockdowns of 2020 forced a rapid shift to remote processing. Staff working from home submitted scanned documents through multiple channels — email, the Delhi government's e-District portal, and in some cases WhatsApp forwards that were later manually uploaded. The e-District portal, which handles citizen service requests from Aadhaar linkage to caste certificates, processed over 1.2 crore applications between April 2020 and March 2021 alone, according to figures published by the Delhi government's Department of Information Technology. A significant proportion of those applications included image attachments uploaded more than once by applicants who were uncertain whether the first submission had registered.
Within the archival systems maintained by the Delhi State Archives on Rani Jhansi Road, the problem is older and in some ways more serious. Photographic records of Old Delhi neighbourhoods — Shahjahanabad, Ballimaran, Matia Mahal — that were digitised under a heritage documentation grant in 2013 were later partially rescanned under a separate INTACH-supported project in 2018. By the time anyone compared the two datasets, administrators found substantial overlap, with no automated system in place to flag which version was canonical.
What a Fix Actually Looks Like
Deduplication at this scale is not simply a matter of deleting obvious copies. Scanned government documents are particularly difficult to process algorithmically because image quality varies by scanner, lighting, and paper condition. A certificate scanned in 2013 at 200 DPI and again in 2019 at 300 DPI will not match on a simple hash comparison, even if the underlying document is identical. The standard approach — perceptual hashing combined with metadata cross-referencing — requires both a retroactive audit and a reformed intake process going forward.
The Delhi e-Governance Society has circulated a technical note internally proposing a phased deduplication exercise beginning in the third quarter of 2026, though no public announcement of a budget allocation or timeline has been made. Citizen services that run through the e-District portal, accessible from Common Service Centres spread across all eleven Delhi districts, are unlikely to be disrupted during any cleanup effort. For Delhiites who have pending applications tied to the affected records — particularly those involving property documents registered before 2020 at Sub-Registrar offices in areas like Dwarka or Rohini — the practical advice is straightforward: keep physical originals accessible and request a document reference number at every stage of any e-governance interaction, so records can be traced even if a digital duplicate creates a conflict downstream.