
Potential Field Based Deep Metric Learning
Shubhang Bhatnagar, Narendra Ahuja
abstract /
project page /
arxiv preprint
Deep metric learning (DML) involves training a network to learn a semantically meaningful representation space.
Many current approaches mine ntuples of examples and model interactions within each tuplets. We present a novel, compositional DML model,
inspired by electrostatic fields in physics that, instead of in tuples, represents the influence of each example (embedding) by a continuous
potential field, and superposes the fields to obtain their combined global potential field. We use attractive/repulsive potential fields to represent interactions
among embeddings from images of the same/different classes. Contrary to typical learning methods, where mutual influence of samples is proportional to their distance,
we enforce reduction in such influence with distance, leading to a decaying field. We show that such decay helps improve performance on real world datasets with large
intraclass variations and label noise. Like other proxybased methods, we also use proxies to succinctly represent subpopulations of examples. We evaluate our method
on three standard DML benchmarks Cars196, CUB2002011, and SOP datasets where it outperforms stateoftheart baselines.


Improving Multilabel Recognition using Class CoOccurrence Probabilities
Shubhang Bhatnagar*, Samyak Rawlekar* , Vishnuvardhan Pogunulu Srinivasulu, Narendra Ahuja
CVPRW 2024
abstract /
project page /
extended arxiv
Multilabel Recognition (MLR) involves the identification of multiple objects within an image.
To address the additional complexity of this problem, recent works have leveraged information from visionlanguage models (VLMs) trained on large textimages datasets for
the task. These methods learn an independent classifier for each object (class), overlooking correlations in their occurrences. Such cooccurrences can be captured from the
training data as conditional probabilities between a pair of classes. We propose a framework to extend the independent classifiers by incorporating the
cooccurrence information for object pairs to improve the performance of independent classifiers. We use a Graph Convolutional Network (GCN) to enforce the
conditional probabilities between classes, by refining the initial estimates derived from image and text sources obtained using VLMs. We validate our method on four MLR
datasets, where our approach outperforms all stateoftheart methods.


PiecewiseLinear Manifolds for Deep Metric Learning
Shubhang Bhatnagar, Narendra Ahuja
Conference on Parsimony and Learning (CPAL) , 2024 (Oral)
abstract /
project page /
paper
Unsupervised deep metric learning (UDML) focuses on learning a semantic representation space
using only unlabeled data. This challenging problem requires accurately estimating the similarity between data points, which is used
to supervise a deep network. For this purpose, we propose to model the highdimensional data manifold using a piecewiselinear
approximation, with each lowdimensional linear piece approximating the data manifold in a small neighborhood of a point.
These neighborhoods are used to estimate similarity between data points. We empirically show that this similarity estimate correlates
better with the ground truth than the similarity estimates of current stateoftheart techniques. We also show that proxies, commonly
used in supervised metric learning, can be used to model the piecewiselinear manifold in an unsupervised setting, helping improve
performance. Our method outperforms existing unsupervised metric learning approaches on standard zeroshot image retrieval benchmarks.


LongDistance Gesture Recognition using Dynamic Neural Networks
Shubhang Bhatnagar, Sharath Gopal , Narendra Ahuja , Liu Ren
International Conference on Intelligent Robots and Systems (IROS) , 2023
abstract /
project page /
paper /
arxiv
Gestures form an important medium of communication between humans and machines. An overwhelming
majority of existing gesture recognition methods are tailored to
a scenario where humans and machines are located very close
to each other. This shortdistance assumption does not hold
true for several types of interactions, for example gesturebased
interactions with a floor cleaning robot or with a drone. Methods
made for shortdistance recognition are unable to perform
well on longdistance recognition due to gestures occupying
only a small portion of the input data. Their performance is
especially worse in resource constrained settings where they
are not able to effectively focus their limited compute on the
gesturing subject. We propose a novel, accurate and efficient
method for the recognition of gestures from longer distances. It
uses a dynamic neural network to select features from gesturecontaining
spatial regions of the input sensor data for further
processing. This helps the network focus on features important
for gesture recognition while discarding background features
early on, thus making it more compute efficient compared
to other techniques. We demonstrate the performance of our
method on the LDConGR longdistance dataset where it
outperforms previous stateoftheart methods on recognition
accuracy and compute efficiency.


PAL: Pretext based Active Learning
Shubhang Bhatnagar, Sachin Goyal, Darshan Tank, Amit Sethi
British Machine Vision Conference (BMVC), 2021
abstract /
project page /
paper /
code
The goal of poolbased active learning is to judiciously select a fixedsized subset of
unlabeled samples from a pool to query an oracle for their labels, in order to maximize
the accuracy of a supervised learner. However, the unsaid requirement that the oracle
should always assign correct labels is unreasonable for most situations. We propose an
active learning technique for deep neural networks that is more robust to mislabeling than
the previously proposed techniques. Previous techniques rely on the task network itself
to estimate the novelty of the unlabeled samples, but learning the task (generalization)
and selecting samples (outofdistribution detection) can be conflicting goals. We use
a separate network to score the unlabeled samples for selection. The scoring network
relies on selfsupervision for modeling the distribution of the labeled samples to reduce
the dependency on potentially noisy labels. To counter the paucity of data, we also deploy
another head on the scoring network for regularization via multitask learning and use an
unusual selfbalancing hybrid scoring function. Furthermore, we divide each query into
subqueries before labeling to ensure that the query has diverse samples. In addition to
having a higher tolerance to mislabeling of samples by the oracle, the resultant technique
also produces competitive accuracy in the absence of label noise. The technique also
handles the introduction of new classes onthefly well by temporarily increasing the
sampling rate of these classes. We make our code publicly available at https://
github.com/shubhangb97/PAL_pretext_based_active_learning


Analyzing Cross Validation in Compressed Sensing with Mixed Gaussian and Impulse Measurement Noise with L1 Errors
Shubhang Bhatnagar^{*}, Chinmay Gurjarpadhye^{*}, Ajit Rajwade
European Signal Processing Conference (EUSIPCO),
abstract /
paper /
extended arxiv
Compressed sensing (CS) involves sampling
signals at rates less than their Nyquist rates and attempting to reconstruct them after sample acquisition.
Most such algorithms have parameters, for example the regularization parameter in LASSO, which need to be chosen carefully for optimal
performance. These parameters can be chosen based on assumptions on the noise level or signal sparsity, but this knowledge may often be
unavailable. In such cases, cross validation (CV) can be used to choose these parameters in a purely datadriven fashion. Previous work
analyzing the use of CV in CS has been based on the â2 crossvalidation error with Gaussian measurement noise. But it is well known that the â2
error is not robust to impulse noise and provides a poor estimate of the recovery error, failing to choose the best parameter. Here we propose
using the â1âCV error which provides substantial performance benefits given impulse measurement noise. Most importantly, we provide a detailed
theoretical analysis and error bounds for the use of â1âCV error in CS reconstruction. We show that with high probability, choosing the parameter
that yields the minimum â1âCV error is equivalent to choosing the minimum recovery error (which is not observable in practice). To our best
knowledge, this is the first paper which theoretically analyzes â1 based CV in CS.


Insights on coding gain and its properties for principal component filter banks
Prasad Chaphekar, Aniket Bhatia,Shubhang Bhatnagar, Abhiraj Kanse, Ashish V Vanmali, Vikram M Gadre
SÄdhanÄ , Journal of the Indian Academy of Sciences, 2023
abstract /
paper
Principal Component Filter Bank (PCFB) is considered optimal in terms of coding gain for speciďŹcconditions.
P P Vaidyanathan stated that coding gain does not necessarily always increase with the increase inthe number of bands. However, very few attempts are made
in the literature to go beyond the conďŹnes of work done by P P Vaidyanathan. We present analytic proofs for the monotonicity of speciďŹc shapes of PSDs.
This papers also derives properties of coding gain of PCFBs, which brings the new insights on the coding gain of Principal Component Filter Banks.


QR Code Denoising using Parallel Hopfield Networks
Shubhang Bhatnagar^{*}, Ishan Bhatnagar^{*}
Arxiv , 2018
abstract /
arxiv
We propose a novel algorithm for using Hopfield
networks to denoise QR codes. Hopfield networks have mostly been
used as a noise tolerant memory or to solve difficult combinatorial
problems. One of the major drawbacks in their use in noise tolerant
associative memory is their low capacity of storage, scaling only
linearly with the number of nodes in the network. A larger capacity
therefore requires a larger number of nodes, thereby reducing the
speed of convergence of the network in addition to increasing
hardware costs for acquiring more precise data to be fed to a larger
number of nodes. Our paper proposes a new algorithm to allow the
use of several Hopfield networks in parallel thereby increasing the
cumulative storage capacity of the system many times as compared
to a single Hopfield network. Our algorithm would also be much
faster than a larger single Hopfield network with the same total
capacity. This enables their use in applications like denoising QR
codes, which we have demonstrated in our paper. We then test our
network on a large set of QR code images with different types of
noise and demonstrate that such a system of Hopfield networks can
be used to denoise and recognize QR codes in real time.


Memory Efficient Adaptive Attention For Multiple Domain Learning
Himanshu Pradeep Aswani, Abhiraj Sunil Kanse, Shubhang Bhatnagar, Amit Sethi
Arxiv , 2021
abstract /
arxiv
We propose a novel algorithm for using Hopfield
networks to denoise QR codes. Hopfield networks have mostly been
used as a noise tolerant memory or to solve difficult combinatorial
problems. One of the major drawbacks in their use in noise tolerant
associative memory is their low capacity of storage, scaling only
linearly with the number of nodes in the network. A larger capacity
therefore requires a larger number of nodes, thereby reducing the
speed of convergence of the network in addition to increasing
hardware costs for acquiring more precise data to be fed to a larger
number of nodes. Our paper proposes a new algorithm to allow the
use of several Hopfield networks in parallel thereby increasing the
cumulative storage capacity of the system many times as compared
to a single Hopfield network. Our algorithm would also be much
faster than a larger single Hopfield network with the same total
capacity. This enables their use in applications like denoising QR
codes, which we have demonstrated in our paper. We then test our
network on a large set of QR code images with different types of
noise and demonstrate that such a system of Hopfield networks can
be used to denoise and recognize QR codes in real time.

