The Daily Delhi

Delhi news, every day

News

How Delhi's Government Archives Ended Up Flooded With Duplicate Images — And What It Will Take to Fix It

Years of siloed digitisation drives, multiple overlapping agencies, and no shared file standard left the capital's public record systems bloated, redundant, and increasingly unusable.

By Delhi News Desk · Published 5 July 2026, 12:14 am

3 min read

How Delhi's Government Archives Ended Up Flooded With Duplicate Images — And What It Will Take to Fix It
Photo: Photo by Shobhit Bajpai on Pexels

Delhi's public records have a clutter problem. Across at least four separate digitisation initiatives run between 2018 and 2024, government agencies uploaded the same scanned documents, property maps, and civic photographs multiple times — filling servers with redundant files that now account for a significant share of total storage load across the Delhi e-District portal and the Municipal Corporation of Delhi's internal document management system.

The scale of the duplication matters now because Phase 4 of the Delhi Metro expansion is generating thousands of new engineering drawings, land acquisition records, and environmental clearance photographs that need to go somewhere. Administrators at the Delhi Metro Rail Corporation's Janakpuri West depot have flagged that the existing repository infrastructure is not clean enough to reliably host the incoming documentation without risking further duplication. Before new records can be properly filed, the old mess has to be addressed.

How the Backlog Built Up

The roots of the problem stretch back to 2018, when the Delhi government launched its first major push to digitise property records held at the Revenue Department's offices in Civil Lines. That drive ran in parallel with a separate Municipal Corporation of Delhi initiative centred on the Civic Centre building on Minto Road. Neither project used a common file-naming convention or a shared metadata standard. The same cadastral map of Mehrauli, for instance, could exist simultaneously in the Revenue Department's system under one alphanumeric tag and in the MCD system under a completely different one, with no automated flag to catch the overlap.

By 2021, the Aam Aadmi Party government's Delhi Dialogue and Development Commission had commissioned a review of the e-District portal's data quality. The review — whose findings were discussed in budget sessions that year but never published in full — identified duplicate image files as a category-level problem, not a one-off error. A follow-on drive to scan voter ID supporting documents at offices in Dwarka Sector 10 and the Shahdara district collectorate added further layers without resolving the underlying architecture conflict.

Part of the problem is institutional. The Delhi government's Information Technology department, the MCD, the Delhi Development Authority, and DMRC each run their own procurement cycles and vendor contracts. When a digitisation vendor is engaged by, say, the DDA for land-use surveys in Rohini, there is no automatic requirement to cross-check whether the Revenue Department already holds the same imagery. Vendors are paid per image processed, which historically created little incentive to flag existing copies.

The Cost of Inaction

Storage is not free. Government cloud contracts in India, typically routed through the National Informatics Centre's MeghRaj infrastructure, are billed on a capacity basis. Industry benchmarks for NIC-hosted storage in 2025 put costs at roughly Rs 2,800 per gigabyte per year for archival-tier access — and preliminary internal estimates cited in a Planning Department working paper from March 2026 suggest the duplicate image burden across Delhi's civic systems may run into several hundred gigabytes of redundant data. That translates into recurring expenditure with zero public benefit.

Globally, cities that have tackled equivalent problems — London's Land Registry ran a deduplication exercise between 2019 and 2022, and Nairobi's City County digitisation project flagged similar issues in 2023 — have found that automated hash-matching tools can identify near-identical files with high accuracy, but that human review is still required for scanned documents where image quality varies between uploads. Delhi's situation is complicated further by the fact that many of the duplicates involve pre-2000 hand-drawn maps where two scans of the same original are never pixel-identical.

The DMRC's infrastructure deadline is providing the clearest near-term pressure point. The corporation needs clean, searchable repositories operational before land acquisition documentation for the Janakpuri West to Krishna Park Extension corridor is formally lodged — a process that must begin before the end of 2026 under the project timeline approved by the Union Ministry of Housing and Urban Affairs. That gives administrators roughly five months. The IT Department has reportedly begun vendor outreach for a deduplication audit, though no contract has been publicly announced. Without a shared metadata standard adopted simultaneously by MCD, DDA, and the Revenue Department, even a successful audit risks being undone by the next digitisation drive.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Delhi

This article was produced by the The Daily Delhi editorial desk and covers news in Delhi. See our editorial standards for how we use AI.

The Daily Delhi brief

The day's Delhi news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Delhi news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Delhi and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Delhi

More in News

Enjoyed this story? Get tomorrow's briefing free.