Delhi's public digital infrastructure is carrying a hidden weight. Across municipal servers, heritage documentation projects and civic portals, duplicate image files now account for a measurable — and expensive — portion of stored data, according to audits reviewed by officials at multiple city agencies this year. The problem is not new, but a convergence of digitisation drives and budget scrutiny in the first half of 2026 has forced administrators to finally confront it.
The timing matters because Delhi is mid-stream in several large-scale digitisation efforts. The Delhi Urban Heritage Foundation, which operates out of its office near Kasturba Gandhi Marg, has been scanning archival photographs of Old Delhi's havelis and bazaars since 2023. Simultaneously, the Delhi Development Authority launched a document portal in late 2024 to make land records and project maps publicly accessible. Both initiatives have ingested tens of thousands of image files monthly — and neither started with a mandatory deduplication protocol in place.
What the Numbers Actually Show
Storage audits at mid-sized government offices in India typically find that between 20 and 35 percent of image files are exact or near-exact duplicates, a figure cited in a 2023 National Informatics Centre advisory on data hygiene for state portals. Apply even the lower end of that range to Delhi's situation and the problem becomes concrete: if a single civic portal is managing 500,000 image files, somewhere between 100,000 and 175,000 of those files may be redundant. At current commercial cloud storage rates in India — roughly Rs 2 to Rs 4 per gigabyte per month for government-tier contracts — even modest deduplication exercises can free up budgets running into lakhs of rupees annually.
The Delhi Metro Rail Corporation's infrastructure documentation unit on Barakhamba Road has faced a version of this problem specific to engineering records. Phase 4 expansion work on corridors including the Janakpuri West to RK Ashram Marg line has generated thousands of site photographs since construction accelerated in 2024. Without automated duplicate detection, inspection teams filing reports from different stations sometimes upload the same reference image multiple times under different file names, fragmenting version control and complicating audit trails when contractors dispute site conditions.
Heritage digitisation carries a different but related risk. When scanned images of the same Chandni Chowk street photograph exist in three slightly different resolutions across two servers, archivists waste time resolving which version is authoritative. The Delhi Archives, housed in the Old Secretariat complex near the Civil Lines area, manages hundreds of thousands of physical documents that are now being converted to digital formats. Staff members there have noted internally that file naming conventions vary between scanning batches, which is one of the primary reasons duplicates proliferate undetected in the first place.
The Fix Is Largely Technical — but Politics Complicate It
Deduplication software exists and is not expensive. Hash-based tools that compare file fingerprints can process a library of one million images in hours on standard government hardware. Several state governments, including Telangana, began mandating deduplication checks as part of their data governance frameworks between 2022 and 2024. Delhi has no equivalent published mandate yet.
The complication in Delhi is partly institutional. Digitisation projects are spread across agencies that report to different political principals — the AAP-run city government controls some departments while others fall under central government oversight through the Lieutenant Governor's office. Co-ordinating a city-wide data hygiene standard requires sign-off that crosses those jurisdictional lines, something that has stalled other inter-agency IT proposals in the past.
For individual departments that want to act independently, the path forward is straightforward. Agencies can run open-source deduplication tools against existing repositories, establish a mandatory hash-check at the point of file upload, and assign a single data custodian per project to arbitrate version conflicts. The Delhi State Data Centre on IP Estate Road already has the technical capacity to host a shared deduplication service — the question is whether the administrative will exists to mandate its use before the next wave of digitisation projects begins uploading more of the same.