Amartya Banerjee

Email amartya1@cs.unc.edu

I am currently a PhD student in the Department of Computer Science at University of North Carolina, Chapel Hill. I am extremely fortunate to be advised by Prof. Harlin Lee and Prof. Caroline Moosmueller. I am also a member of Geometric Data Analysis @ UNC. Broadly, my research interests revolve around problems in Optimal Transport and Machine Learning.

Prior to joining my PhD, I graduated with a double major in Mathematics and Computer Science from University of Maryland, College Park followed by a Master's in Computer Science, also from UMD.

Fun Fact: I have an Erdos number of 3 (via collaboration with Sebastian Cioaba)

Google Scholar | Github | LinkedIn

Activities
  • [Nov 2023]: Poster Presentation at TriCAMS at Duke University
  • [Nov 2023]: Poster Presentation at Data Science Week, Purdue University
  • [Sept 2023]: Poster Presentation at Data Science Day (AI and Health), UNC Chapel Hill
Research

2024

sym Surprisal Driven K-NN for Robust and Interpretable Nonparametric Learning
Amartya Banerjee, Christopher J. Hazard, Jacob Beel, Cade Mack, Jack Xia, Michael Resnick, Will Goddin.
arXiv, 2024
pdf | abstract | bibtex | Code

Nonparametric learning is a fundamental concept in machine learning that aims to capture complex patterns and relationships in data without making strong assumptions about the underlying data distribution. Owing to simplicity and familiarity, one of the most well-known algorithms under this paradigm is the $k$-nearest neighbors ($k$-NN) algorithm. Driven by the usage of machine learning in safety-critical applications, in this work, we shed new light on the traditional nearest neighbors algorithm from the perspective of information theory and propose a robust and interpretable framework for tasks such as classification, regression, density estimation, and anomaly detection using a single model. We can determine data point weights as well as feature contributions by calculating the conditional entropy for adding a feature without the need for explicit model training. This allows us to compute feature contributions by providing detailed data point influence weights with perfect attribution and can be used to query counterfactuals. Instead of using a traditional distance measure which needs to be scaled and contextualized, we use a novel formulation of \textit{surprisal} (amount of information required to explain the difference between the observed and expected result). Finally, our work showcases the architecture's versatility by achieving state-of-the-art results in classification and anomaly detection, while also attaining competitive results for regression across a statistically significant number of datasets.


    @misc{banerjee2024surprisal,
          title={Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric Learning}, 
          author={Amartya Banerjee and Christopher J. Hazard and Jacob Beel and Cade Mack and Jack Xia and Michael Resnick and Will Goddin},
          year={2024},
          eprint={2311.10246},
          archivePrefix={arXiv},
          primaryClass={cs.LG}
    }
    
sym Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion
Siyuan Shan, Yang Li, Amartya Banerjee, Junier B. Oliva.
AAAI, 2024
pdf | abstract | bibtex | Code

Voice conversion (VC) aims at altering a person's voice to make it sound similar to the voice of another person while preserving linguistic content. Existing methods suffer from a dilemma between content intelligibility and speaker similarity; i.e., methods with higher intelligibility usually have a lower speaker similarity, while methods with higher speaker similarity usually require plenty of target speaker voice data to achieve high intelligibility. In this work, we propose a novel method \textit{Phoneme Hallucinator} that achieves the best of both worlds. Phoneme Hallucinator is a one-shot VC model; it adopts a novel model to hallucinate diversified and high-fidelity target speaker phonemes based just on a short target speaker voice (e.g. 3 seconds). The hallucinated phonemes are then exploited to perform neighbor-based voice conversion. Our model is a text-free, any-to-any VC model that requires no text annotations and supports conversion to any unseen speaker. Quantitative and qualitative evaluations show that \textit{Phoneme Hallucinator} outperforms existing VC methods for both intelligibility and speaker similarity.


        @article{DBLP:journals/corr/abs-2308-06382,
          author       = {Siyuan Shan and
                          Yang Li and
                          Amartya Banerjee and
                          Junier B. Oliva},
          title        = {Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion},
          journal      = {CoRR},
          volume       = {abs/2308.06382},
          year         = {2023},
          url          = {https://doi.org/10.48550/arXiv.2308.06382},
          doi          = {10.48550/ARXIV.2308.06382},
          eprinttype    = {arXiv},
          eprint       = {2308.06382},
          timestamp    = {Wed, 23 Aug 2023 14:43:32 +0200},
          biburl       = {https://dblp.org/rec/journals/corr/abs-2308-06382.bib},
          bibsource    = {dblp computer science bibliography, https://dblp.org}
          keywords = {Voice Conversion, Set Expansion},
        }
    

2021

sym Toughness of Kneser Graphs
Davin Park, Anthony Ostuni, Nathan Hayes, Amartya Banerjee, Tanay Wakhare, Wiseley Wong and Sebastian Cioaba.
Discrete Mathematics, 2021
pdf | abstract | bibtex | Code

The toughness t(G) of a graph G is a measure of its connectivity that is closely related to Hamiltonicity. Xiaofeng Gu, confirming a longstanding conjecture of Brouwer, recently proved the lower bound t(G)≥ℓ/λ−1 on the toughness of any connected ℓ-regular graph, where λ is the largest nontrivial absolute eigenvalue of the adjacency matrix. Brouwer had also observed that many families of graphs (in particular, those achieving equality in the Hoffman ratio bound for the independence number) have toughness exactly ℓ/λ. Cioabă and Wong confirmed Brouwer's observation for several families of graphs, including Kneser graphs K(n,2) and their complements, with the exception of the Petersen graph K(5,2). In this paper, we extend these results and determine the toughness of Kneser graphs K(n,k) when k∈{3,4} and n≥2k+1 as well as for k≥5 and sufficiently large n (in terms of k). In all these cases, the toughness is attained by the complement of a maximum independent set and we conjecture that this is the case for any k≥5 and n≥2k+1.


              @article{PARK2021112484,
                title = {The toughness of Kneser graphs},
                journal = {Discrete Mathematics},
                volume = {344},
                number = {9},
                pages = {112484},
                year = {2021},
                issn = {0012-365X},
                doi = {https://doi.org/10.1016/j.disc.2021.112484},
                url = {https://www.sciencedirect.com/science/article/pii/S0012365X21001977},
                author = {Davin Park and Anthony Ostuni and Nathan Hayes and
                         Amartya Banerjee and Tanay Wakhare and Wiseley Wong 
                         and Sebastian Cioabă},
                keywords = {Kneser graph, Graph connectivity, Toughness},
          
Professional Experience
Selected Awards
  • John D. Gannon Endowed Scholarship (2018): Only international student to receive this merit-based scholarship for the academic year.
  • Now-A-Terp Scholarship (2017)
  • GMU Merit Scholarship (2016)