Delhi's government agencies collectively hold tens of thousands of digital photographs — infrastructure projects, public health drives, Metro expansion sites — and a significant share of them are duplicates, near-duplicates, or mislabelled images that make the archive functionally useless. The push to clean up and rationalise those records has reached a decision point, with multiple bodies now weighing whether to commission automated deduplication, contract it out, or simply start again.
The issue has sharpened because of deadlines. Delhi Metro Rail Corporation's Phase 4 expansion, which covers corridors including the Janakpuri West–RK Ashram Marg stretch and the Aerocity–Tughlakabad line, is generating fresh documentation on a daily basis. Engineers, contractors and communications teams are all filing images into overlapping repositories. Without a clear protocol for identifying and retiring duplicate files, the Phase 4 visual record risks the same bloat that plagued earlier phases.
Why the Bureaucratic Backlog Matters Now
It is not purely an administrative nuisance. When the Delhi Pollution Control Committee publishes photo evidence for its monitoring reports — smog readings from Anand Vihar, dust suppression checks near construction sites along the Dwarka Expressway — duplicated or mislabelled images have in past cycles surfaced in public documents, undermining confidence in the data. The Public Works Department faces the same problem with its road-repair documentation, particularly along corridors such as the Outer Ring Road and stretches through Rohini and Dwarka.
The Aam Aadmi Party government and the BJP-led central government both have a stake in how this plays out. The Delhi government's e-governance push, centred on its Delhi Integrated Multi-Modal Transit System and various citizen-service portals under the Delhi e-District platform, depends on clean data pipelines. Central ministries operating in the capital — including the Ministry of Housing and Urban Affairs, which oversees several Old Delhi heritage preservation schemes in areas like Shahjahanabad — have their own documentation systems that do not always talk to state-level archives.
The Decisions Ahead
Three choices will define the outcome. The first is technical: whether to deploy perceptual hashing tools that can identify visually identical or near-identical images automatically, or rely on manual review. Perceptual hashing is standard practice among major photo agencies and can process large volumes quickly, but it requires upfront licensing costs and trained operators — neither of which Delhi's civic IT departments have budgeted for in the current financial year, which runs to March 2027.
The second is jurisdictional. No single agency owns the problem. The Delhi government's Department of Information Technology sits alongside the National Informatics Centre, which provides the federal digital backbone, and both have overlapping mandates for data governance in the capital. Until one body takes formal ownership of a consolidated image policy, the duplicate problem will continue to compound. The National Informatics Centre's Delhi unit, based at CGO Complex in Lodhi Road, is the most plausible lead agency, but that designation has not been formalised.
The third decision is about retention policy. Even once duplicates are identified, agencies must decide what to delete, what to archive offline, and what to keep in live systems. The Delhi State Archives, housed in a facility near IP Estate, maintains physical and digital records under the Delhi Public Records Act, but its digital intake protocols were last formally updated several years ago and may not map cleanly onto the volume of imagery now being generated by infrastructure projects alone.
The practical path forward involves setting a deadline — ideally before the end of the monsoon season in September 2026, when Phase 4 construction activity typically pauses — for a cross-agency working group to produce a unified image governance protocol. That protocol needs to name a lead agency, specify file-naming standards, set retention periods, and mandate at least annual deduplication audits. Without those specifics locked down before the next round of Phase 4 milestone documentation begins, the archive will simply grow larger, messier, and harder to fix.