Delhi's government-linked digital repositories collectively hold an estimated several million image files, and a growing share of them are exact or near-exact copies of the same photographs. The problem is not abstract. Storage costs real money, retrieval takes real time, and in a city running parallel digitisation campaigns across heritage conservation, urban planning and public health documentation, the duplication is compounding at pace.
The timing matters because three major archival pushes are running simultaneously right now. The Archaeological Survey of India's Delhi Circle has been digitising monuments across the Mehrauli Archaeological Park. The Delhi Metro Rail Corporation is documenting Phase 4 construction progress — covering corridors from Janakpuri West to R.K. Ashram Marg — generating thousands of site photographs weekly. And the Delhi Urban Heritage Foundation has been cataloguing structures across Shahjahanabad, the walled city area of Old Delhi, as part of a mapping initiative tied to the Municipal Corporation of Delhi. All three generate images. All three, according to public procurement documents reviewed as part of budget discussions for the 2025-26 civic IT cycle, have identified redundant file accumulation as a known operational problem.
The Storage Arithmetic Nobody Wants to Do
Cloud and on-premises storage is not free. Government-grade storage procurement in India — sourced typically through National Informatics Centre empanelled vendors — runs roughly between ₹3 and ₹8 per gigabyte per month depending on redundancy tier and contract volume, based on published NIC rate cards. A single high-resolution site photograph from a DSLR camera used by heritage or infrastructure teams commonly runs between 20 and 40 megabytes. If even 30 percent of a 500,000-image archive is duplicated, that is 150,000 files potentially consuming 4.5 terabytes of avoidable storage. At mid-range government rates, that excess costs roughly ₹1.08 lakh per month — more than ₹13 lakh a year — on a single medium-sized archive alone. Scale that across the dozen or more departments maintaining photo records in the National Capital Territory, and the figure climbs steeply.
The duplication problem is not unique to photography. The Delhi Secretariat's document management systems, which handle records for departments housed in the I.P. Estate complex near ITO, have faced similar redundancy issues with scanned PDFs. But image files are particularly costly because their file sizes are large and because automated deduplication tools — which use perceptual hashing algorithms to identify visually identical or near-identical photographs — are still not standard-issue in most state government IT stacks in India.
What Deduplication Actually Involves — and What Delhi Is Starting to Do
Perceptual hash-based deduplication works differently from simple checksum matching. A checksum flags only files that are byte-for-byte identical. A perceptual hash — tools like pHash or dHash are widely used open-source options — compares visual content, catching the same image saved at different resolutions, with different filenames, or with minor colour corrections applied. For archival work in heritage documentation, where the same photograph of a Chandni Chowk haveli might be uploaded by three different field surveyors from the same afternoon visit, this distinction matters enormously.
The Delhi Metro Rail Corporation's IT wing put out a request for a records management system upgrade in late 2025, with deduplication listed among the functional requirements in procurement notices published on the Central Public Procurement Portal. The Municipal Corporation of Delhi's digitisation cell, operating out of offices near Dr. S.P. Mukherjee Civic Centre in Connaught Place, has reportedly been piloting a storage audit tool, though no formal public outcome report has been released yet.
For departments still waiting on centralised solutions, the practical path forward involves three steps: run a perceptual hash scan on existing archives before migrating to any new system, establish a single-upload protocol so field teams submit images to one repository rather than emailing batches to multiple supervisors, and set quarterly deduplication checks as a procurement condition in any new vendor contract. The mathematics are not complicated. The will to act on them has simply, so far, been slow to follow.