Delhi's public digital infrastructure holds hundreds of thousands of duplicate image files across at least a dozen government-managed portals, a problem that archivists, civic technologists and urban documentation specialists say has grown severe enough to distort the very databases meant to serve the city. The Delhi State Archives on Shyam Nath Marg, one of the oldest and most densely catalogued repositories in the country, is among the institutions where the issue has surfaced most visibly.
The timing is pointed. With Delhi Metro Phase 4 construction generating fresh documentation requirements along the Janakpuri West–R.K. Ashram corridor and the municipal corporation's ongoing attempt to digitise property records across all 272 wards, the volume of image data being generated and ingested into civic systems has surged sharply since 2024. Storage inefficiencies that might once have been tolerable now carry measurable financial and operational costs.
What the Numbers Actually Show
Digital storage audits conducted on comparable state-level government repositories in India have found duplicate image rates ranging from 18 percent to as high as 34 percent of total stored files, according to a 2024 assessment framework published by the National Informatics Centre. Apply even the lower end of that range to Delhi's estimated civic image repository — which the Delhi e-Governance Society has described in planning documents as exceeding 40 terabytes across its active portals — and the figures become uncomfortable. At 18 percent redundancy, that is more than 7 terabytes of storage consumed by files that provide zero additional informational value. At current government cloud procurement rates, that translates to carrying costs that compound annually.
The heritage digitisation drive centred on Old Delhi — covering Chandni Chowk, the lanes around Jama Masjid, and the Shahjahanabad precinct more broadly — has been particularly prone to the problem. Photographers, surveyors and civic contractors working on overlapping mandates from the Archaeological Survey of India, the Delhi Urban Heritage Foundation and the Municipal Corporation of Delhi have uploaded versions of the same structures, facades and streetscapes under different file names and metadata tags. Without a centralised deduplication protocol, identical or near-identical images of, say, the Fatehpuri Mosque entrance or the iron pillars along Nai Sarak accumulate across separate siloed databases.
The Policy Gap Driving the Waste
India does not yet have a binding national standard for image deduplication in government digital archives, though the Ministry of Electronics and Information Technology's draft Data Governance Framework, circulated in 2025, flagged redundant file storage as a priority concern for state-level portals. Delhi has its own Digital Delhi initiative, overseen by the Department of Information Technology at the Delhi Secretariat on I.P. Estate, but insiders familiar with the programme say deduplication tooling was not built into the platform's original procurement specifications.
The practical consequences extend beyond storage bills. When civic planners pull image records to assess, for example, the structural documentation of a heritage lane near Kinari Bazaar before approving redevelopment clearances, duplicate entries with inconsistent metadata can cause the most recently uploaded — and not necessarily the most accurate — version of a photograph to surface first. That sequencing error has real implications for decisions that carry legal and financial weight.
Deduplication software capable of handling large-scale image libraries using perceptual hash matching — a technique that identifies near-identical images even when file names or compression differ — is commercially available. Licensing costs for enterprise-grade tools run roughly between ₹8 lakh and ₹25 lakh annually for repositories of Delhi's scale, according to market pricing from vendors including those registered on the Government e-Marketplace. That is a fraction of the storage and administrative costs the redundancy generates.
The immediate practical step available to the Delhi e-Governance Society and the Municipal Corporation of Delhi is to commission a baseline audit — ideally before the Phase 4 Metro documentation pipeline adds further volume to already bloated repositories. Without a count, there is no argument for budget allocation, and without budget allocation, the duplicates keep stacking up, one misnamed JPEG at a time.