At least 40 percent of images stored across Delhi's major government-run digital repositories are estimated to be duplicates or near-identical copies — a quiet crisis of data hygiene that is costing municipal and state agencies significant sums in avoidable cloud storage expenditure, according to a review of procurement records and technical audits circulating among IT planners at the Delhi Secretariat in Mayur Vihar.
The problem has sharpened because of a collision of timelines. The Delhi Metro Rail Corporation is pushing ahead with Phase 4 construction documentation — generating thousands of site photographs weekly — while the Delhi Urban Heritage Foundation, based near Kashmere Gate, has been racing since 2024 to digitise its archive of Old Delhi's built environment before demolition and redevelopment erase what the physical record cannot recover. Both processes are flooding shared government cloud buckets with image files that nobody is systematically deduplicating.
What the Numbers Actually Show
Cloud storage costs for government bodies in India typically run between ₹2 and ₹6 per gigabyte per month on domestic providers such as NIC Cloud, which is operated by the National Informatics Centre under the Ministry of Electronics and Information Technology. A single uncompressed site-survey photograph from a Delhi Metro Phase 4 station — say, the Janakpuri West interchange or the proposed Tughlakabad depot — can exceed 8 megabytes. Multiply that by tens of thousands of shoots across 65 kilometres of planned new corridor, add the duplication rate, and the redundant storage burden runs into hundreds of gigabytes monthly on that project alone.
The Municipal Corporation of Delhi, which merged its three predecessor bodies in May 2022, inherited three separate image databases — one each from the former North, South and East corporations — with no reconciliation layer sitting between them. Internal estimates reviewed by technical staff place the overlap in property survey imagery at roughly 35 percent across the combined archive, which as of late 2025 held more than 1.2 million georeferenced photographs of properties across 12 zones. The cost of storing the redundant fraction is not trivial: at even the lower end of NIC Cloud's pricing band, that excess data represents expenditure running to several lakh rupees annually for storage that delivers zero informational value.
Duplicate image accumulation is not an accidental bureaucratic side-effect. It is structurally baked in. Field teams from agencies like the Delhi Jal Board, conducting Yamuna riverbank surveys between Wazirabad Barrage and Okhla Barrage, submit photographs through multiple reporting channels — WhatsApp forwards, official portals and email attachments — meaning the same frame can enter the archive three times under different file names. Perceptual hashing tools, which generate a short digital fingerprint from an image's visual content rather than its file metadata, can catch these cases even when file names differ. Several central government ministries began mandating such tools in their document management systems after a 2023 directive from the National Informatics Centre, but adoption among Delhi's state-level civic bodies has been uneven.
What Comes Next for Civic IT Teams
The pressure to act is building from two directions at once. First, the Bureau of Indian Standards published updated guidelines for e-governance data management in early 2025, which explicitly flag redundant media storage as an audit risk. Second, the Aam Aadmi Party government's stated push to expand its Delhi Data Portal — launched in 2020 as a public-access transparency initiative — means that image datasets from agencies like the Delhi Tourism and Transportation Development Corporation will increasingly be shared with external developers, making duplicate-laden archives both an embarrassment and a practical obstacle to usable APIs.
Civic IT departments that have not already done so should run a baseline audit using open-source perceptual hashing libraries before the end of the current financial year, which closes on 31 March 2027. Agencies managing heritage or survey imagery — particularly those feeding into the Delhi Master Plan 2041 documentation process — ought to consider adopting a single ingestion gateway rather than parallel submission channels. The technology is neither new nor expensive. The bottleneck is institutional will, procurement cycles and the unglamorous reality that cleaning up data is harder to announce than collecting it in the first place.