Instead of mining tuples, represent every embedding as a decaying, electrostatic-style potential field and superpose them, giving a compositional model that is more robust to large intra-class variation and label noise.
Deep metric learning (DML) involves training a network to learn a semantically meaningful representation space. Many current approaches mine n-tuples of examples and model interactions within each tuplets. We present Potential Field based metric learning (PFML), a novel compositional DML model, inspired by electrostatic fields in physics that, instead of in tuples, represents the influence of each example (embedding) by a continuous potential field, and superposes the fields to obtain their combined global potential field. We use attractive/repulsive potential fields to represent interactions among embeddings from images of the same/different classes. Contrary to typical learning methods, where mutual influence of samples is proportional to their distance, we enforce reduction in such influence with distance, leading to a decaying field. We show that such decay helps improve performance on real world datasets with large intra-class variations and label noise. Like other proxy-based methods, we also use proxies to succinctly represent sub-populations of examples. We evaluate our method on three standard DML benchmarks: Cars-196, CUB-200-2011, and SOP datasets where it outperforms state-of-the-art baselines.
Use a continuous potential field to represent interactions between a set of example embeddings, instead of using subsets of examples (triplets/tuplets) or proxies.
With our potential field representation, embeddings need to be driven towards other nearby embeddings belonging to the same class, while also being driven away from embeddings of other classes. This is reminiscent of the behavior of an isolated system of electric charges, where dissimilar charges are drawn together while similar ones are repelled.
No complex mining needed. A continuous field models all interactions directly, avoiding the high-complexity O(N³) tuple mining that pair-based losses rely on.
Better features. All sample interactions are modeled at once, not just a small subset, improving the quality of the learned representation.
Label-noise resilience. Interaction strength decays with distance, so distant mislabeled samples are de-emphasized and intra-class features are preserved.
Better use of proxies. The decaying interaction keeps learned proxies closer (smaller W₂) to the data distribution they represent.
For each class, PFML defines a class potential field Ψ that affects embeddings of only the selected class. This class potential field brings together embeddings of the class while pushing them away from embeddings of other classes. The class potential field is formed from a superposition of potentials belonging to individual embeddings from all classes. The potential field exerted by individual embeddings is designed based on both the principles from electrostatics and observations from DML literature. More details and exact definitions of the potential field can be found in Section 3 of our paper.
We evaluate our method on zero-shot image retrieval over 3 standard benchmarks (Cars-196, CUB-200-2011 and SOP), training 4 different backbones (ResNet50, BN-Inception, ViT and DINO) for a fair comparison with prior work.
Real-world labels are noisy. Under 20% random label corruption PFML degrades the least, beating the next-best method (Proxy Anchor) by +6.0 and +7.6 in Recall@1.
| Method | CUB-200-2011 | Cars-196 | ||
|---|---|---|---|---|
| R@1 | R@2 | R@1 | R@2 | |
| Triplet | 55.1 | 68.7 | 67.5 | 77.9 |
| Multi-Similarity | 58.9 | 71.8 | 70.4 | 79.8 |
| Proxy NCA | 60.1 | 74.7 | 74.3 | 82.4 |
| Proxy Anchor (2nd best) | 60.7 | 75.1 | 76.9 | 83.1 |
| HIST | 59.7 | 74.6 | 72.9 | 81.8 |
| Potential Field (Ours) | 66.7 | 76.9 | 84.5 | 88.6 |
Recall@1 and Recall@2 (%) under 20% random label noise, ResNet-50 backbone (512-dim), averaged over 5 runs. PFML is the least affected, beating the next-best method (Proxy Anchor) by +6.0 and +7.6 R@1 on CUB-200-2011 and Cars-196.
If you find our work useful, please consider citing:
@InProceedings{Bhatnagar_2025_CVPR,
author = {Bhatnagar, Shubhang and Ahuja, Narendra},
title = {Potential Field Based Deep Metric Learning},
booktitle = {Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2025}
}