Delhi's municipal and state government servers are carrying an estimated dead weight of duplicate scanned images running into hundreds of thousands of files — redundant records that eat up storage, slow database queries and, in at least one documented case last year, caused a land title dispute in Rohini to drag on for an extra four months while clerks sorted through mismatched document versions. That single case is a small symptom of a much larger administrative failure.
The problem did not emerge overnight. It is the direct product of how the capital chose to digitise its paper-based governance over roughly a decade — in fits and starts, with overlapping mandates and no single authority holding accountability for data quality. With the Aam Aadmi Party government now under pressure to demonstrate delivery ahead of the next electoral cycle, and the BJP-led central government scrutinising state digital infrastructure spending, the question of who cleans up the archive mess has become politically loaded.
The Digitisation Rush That Left Behind a Mess
The push began in earnest around 2013-2015, when the Delhi government — then under Congress, before AAP's 2015 sweep — started scanning land records held at the offices of the Sub-Registrar in districts across the city, from Karkardooma in the east to Janakpuri in the west. The work was contracted out in tranches to different vendors, each using different scanning software, different naming conventions and different metadata standards. Nobody standardised the handover protocol.
When AAP came to power in February 2015, it inherited this patchwork system. New digitisation drives were launched under the Delhi e-District project, administered through the Revenue Department's offices on I.P. Estate near ITO. But rather than auditing and integrating the existing scanned corpus, new batches were simply added. The National Informatics Centre, which maintains back-end infrastructure for Delhi government portals, flagged storage redundancy concerns as early as 2018, according to budget documents reviewed by this newspaper.
By 2021, the problem had compounded further. The Delhi Metro Rail Corporation's Phase 4 corridor planning — which requires land acquisition records, environmental clearance documents and heritage impact assessments — pulled in documents from at least three separate digitised repositories. Project engineers working on the Janakpuri West to RK Ashram Marg corridor reported that the same survey map appeared in two different departmental databases under different file IDs, creating verification delays during the crucial 2021-22 approval window.
What a Duplicate Image Actually Costs the System
Storage is the obvious cost. The Delhi government's State Data Centre, located in Dwarka Sector 10, has been running capacity expansion projects since 2020. Each terabyte of archive-grade storage carries an operational cost, and duplicate image files — which typically run between 2 MB and 8 MB each for a scanned A4 document — compound that cost at scale. A conservative internal estimate cited in a 2023 Delhi Directorate of Information Technology procurement note put avoidable redundant storage in government document repositories at roughly 18 percent of total archive volume at the time.
The Yamuna riverbank land records, which have been the subject of sustained political dispute between the AAP government and the Delhi Development Authority — a body that answers to the central government — are among the most duplicated sets. Survey documents for the Yamuna floodplain have been scanned separately by the Revenue Department, the DDA and the Delhi Jal Board, with no reconciliation exercise ever formally completed.
The Municipal Corporation of Delhi, which consolidated three separate civic bodies in May 2022, is now sitting on three distinct legacy digital archives with significant overlap, particularly for South and East Delhi property tax records dating from 2010 to 2019.
The practical path forward involves what archivists and IT procurement officers call a deduplication audit — a systematic process of running hash-matching algorithms across file repositories to identify identical or near-identical images, then establishing a master record and retiring the duplicates. The Delhi government issued a tender notice for a Records Management and Digitisation Audit in late 2024. Progress on that contract, and whether it has moved beyond the evaluation stage, will determine whether the capital's public record infrastructure is in workable shape before the next wave of Phase 4 Metro land acquisition proceedings begins in earnest in late 2026.