Piecewise-Linear Manifolds for Deep Metric Learning Conference on Parsimony and Learning (CPAL) 2024 , Oral
Abstract
Unsupervised deep metric learning (UDML) focuses on learning a semantic representation space using only unlabeled data. This challenging problem requires accurately estimating the similarity between data points, which is used to supervise a deep network. For this purpose, we propose to model the high-dimensional data manifold using a piecewise-linear approximation, with each low-dimensional linear piece approximating the data manifold in a small neighborhood of a point. These neighborhoods are used to estimate similarity between data points. We empirically show that this similarity estimate correlates better with the ground truth than the similarity estimates of current state-of-the-art techniques. We also show that proxies, commonly used in supervised metric learning, can be used to model the piecewise-linear manifold in an unsupervised setting, helping improve performance. Our method outperforms existing unsupervised metric learning approaches on standard zeroshot image retrieval benchmarks.
Intuition
- Identify and approximate the submanifold at each point (eg. Ellipses A-E) by a linear model over a neighborhood small enough to (i) not contain multiple submanifolds and (ii) the contained single submanifold is adequately linear. Such low-dimensional (1D) subspaces are assumed to contain semantically similar points.
- Estimate similarity for each pair of points (x1, x2), in terms of the lengths of projections of vector x1 − x2 on the linear neighborhood D, and on the normal to D. Similarity decays faster orthogonal to a neighborhood than along it.
- Train network embedding to bring similar points closer together and push dissimilar points apart. This brings closer together points within the same low-dimensional neighborhoods or those in different neighborhoods but the same low-dimensional (1D) space (eg. A, B, C) closer together. Points not lying in the same low dimensional space (eg. B, D, E) are pushed away from each other.
The figure provides an overview of our method. Points are selected using the neighborhood sampling and their embeddings are calculated using a momentum encoder. Embeddings generated by the momentum encoder are used to construct a piecewise linear approximation of the data manifold.
These embeddings are used to calculate point-point and proxy-point similarities which are in turn used to modulate the distance between them. Locations and neighborhoods of proxies are also updated through backpropagation. Losses colored yellow/ green are calculated only using quantities with the same color.
Example Embedding
Results
Citation
The website template was borrowed from Michaël Gharbi and Ref-NeRF.