Maheep Chaudhary
# RESEARCHER_PROFILE

name:        "Maheep Chaudhary"
role:        "Machine Learning Engineer"
affiliation: "Algoverse AI Research" [link](https://algoverseairesearch.org)
email:       maheepchaudhary.research@gmail.com
links:       [GitHub](https://github.com/MaheepChaudhary) [Twitter](https://twitter.com/MaheepChaudhary) [Scholar](https://scholar.google.com/citations?user=YOUR_ID)
## EDUCATION

- Aug 2023 — May 2024
  Nanyang Technological University, Singapore
  M.Sc. Artificial Intelligence, Computer Science

- Aug 2018 — May 2022
  Bundelkhand Institute of Engineering and Technology, India
  B.E. Electronics and Communication Engineering
## EXPERIENCE

- Jun 2025 — Present  Algoverse
  → Machine Learning Engineer

- Jan 2025 — May 2025  AI Safety Camp
  → Machine Learning Researcher

- Nov 2024 — May 2025  University of Oxford
  → Research Intern (Supervisor: Fazl Barez)

- Jul 2024 — Sep 2024  WhiteBox Research
  → Research Mentor

- Oct 2023 — Sep 2024  Pr(Ai)²R Group, Stanford
  → Alignment Research Engineer (Supervisor: Atticus Geiger)

- Sep 2021 — Jan 2024  UIUC
  → Research Collaboration (Supervisor: Haohan Wang)

- Oct 2019 — Mar 2021  IIT Indore
  → Research Collaboration (Supervisor: Chandresh Maurya)
## RESEARCH_TAXONOMY

primary_focus:
  [AI Safety] [Mechanistic Interpretability]

secondary:
  [Chain-of-Thought] [Causal Abstraction]
  [Deception Detection] [Attention Mechanisms]

methods:
  [White-box Analysis] [Activation Steering]
  [Probing] [Anomaly Detection] [Causal Intervention]

models:
  [Llama] [Gemma] [Qwen] [Mistral] [GPT-2]
## METRICS

journal_conference:  5
workshop_papers:     7
preprints:           6
as_project_lead:     8
## PUBLICATIONS

### Journal & Conference

1. "Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability"
   A. Geiger, D. Ibeling, A. Zur, Maheep Chaudhary, et al.
   [JMLR 2024] [paper](http://jmlr.org/papers/v26/23-0058.html)

2. "MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification"
   S. B. Shah, S. Shiwakoti, Maheep Chaudhary, H. Wang
   [EMNLP 2024] [paper](https://aclanthology.org/2024.emnlp-main.959/)

3. "An Intelligent Recommendation cum Reminder System"
   R. Saxena, Maheep Chaudhary, C.K. Maurya, S. Prasad
   [ACM CODS-COMAD 2022]

4. "CQFaRAD: Collaborative Query-Answering Framework"
   M. Singh, S. Pandey, R. Saxena, Maheep Chaudhary, N. Lal
   [Springer IJIT 2021]
### Workshop Papers (NeurIPS 2025)

1. "Evaluation Awareness Scales Predictably in Open-Weights LLMs"
   Maheep Chaudhary†, I. Su, N. Hooda, et al.
   [Responsible FM] LEAD

2. "FRIT: Using Causal Importance to Improve CoT Faithfulness"
   A. Swaroop, et al., Maheep Chaudhary†
   [FoRLM] LEAD

3. "Optimizing CoT Confidence via Topological and Dirichlet Risk Analysis"
   A. More, et al., Maheep Chaudhary†
   [Responsible FM] LEAD

4. "Alignment-Constrained Dynamic Pruning for LLMs"
   D. Patel, et al., Maheep Chaudhary†
   [Responsible FM] LEAD

5. "SALT: Steering Activations towards Leakage-free Thinking"
   S. Batra, et al., Maheep Chaudhary†
   [Responsible FM] LEAD

6. "Amortized Latent Steering: Low-Cost Test-Time Optimization"
   N. Egbuna, et al., Maheep Chaudhary†
   [Efficient Reasoning] LEAD

7. "Hydra: A Modular Architecture for Efficient Long-Context Reasoning"
   S. Chaudhary, D. Patel, Maheep Chaudhary, B. Browning
   [Efficient Reasoning]
### Preprints

1. "SafetyNet: Detecting Harmful Outputs in LLMs"
   Maheep Chaudhary, F. Barez
   LEAD [arXiv](https://arxiv.org/abs/2505.14300)

2. "Evaluating Sparse Autoencoders on Disentangling Factual Knowledge"
   Maheep Chaudhary, A. Geiger
   LEAD [arXiv](https://arxiv.org/abs/2409.04478)

3. "PALADIN: Self-Correcting LM Agents to Cure Tool-Failure Cases"
   S. V. Vuddanti, et al., Maheep Chaudhary†
   LEAD

4. "Modular Training of Neural Networks aids Interpretability"
   S. Golechha, Maheep Chaudhary, et al.
   [arXiv](https://arxiv.org/abs/2502.02470)

5. "Punctuation and Predicates in Language Models"
   S. Chauhan, Maheep Chaudhary, et al.
   [arXiv](https://arxiv.org/abs/2504.05110)

6. "Towards Trustworthy ML: A Data-centric Survey with Causality Perspectives"
   Maheep Chaudhary*, H. Liu*, H. Wang
   Co-first [arXiv](https://arxiv.org/abs/2307.16851)
## KEY_FINDINGS

[1] Evaluation awareness scales with model size
    → standard eval may fail for AGI-level systems

[2] Harmful content produces distinct attention signatures
    → ~95% detection accuracy

[3] Information flows left→right to last token
    → punctuation/predicates serve as storage anchors

[4] Obfuscation shifts info from non-linear to linear spaces
    → white-box monitoring can detect this
## SERVICE

reviewer:
  - ICML 2025 (Actionable Interpretability)
  - NeurIPS 2024 (MATH+AI, CALM)

mentoring:
  - UNESCO-India-Africa Program (20+ countries)
  - Research Mentor at Algoverse
## AWARDS

- Winner — Smart India Hackathon (200K+ participants)
  Face recognition for Bureau of Police R&D

- Team Leader — ASEAN-India Hackathon (10+ countries)
  Marine Species Detection

- Selected Mentor — UNESCO-India-Africa Program (20+ countries)
  Voice-assisted system for farmers
## CONTACT

email:   maheepchaudhary.research@gmail.com
github:  [GitHub](https://github.com/MaheepChaudhary)
twitter: [Twitter](https://twitter.com/MaheepChaudhary)
scholar: [Google Scholar](https://scholar.google.com/citations?user=YOUR_ID)
## STRUCTURED_DATA

{
  "@type": "Person",
  "name": "Maheep Chaudhary",
  "role": "Machine Learning Engineer",
  "affiliation": "Algoverse AI Research",
  "education": ["M.Sc. AI (NTU)", "B.E. ECE (BIET)"],
  "research_focus": ["AI Safety", "Mechanistic Interpretability"],
  "publications": {
    "journal_conference": 5,
    "workshop": 7,
    "preprints": 6,
    "as_lead": 8
  },
  "venues": ["JMLR", "EMNLP", "NeurIPS", "ACM", "Springer"]
}
Maheep Chaudhary

Maheep Chaudhary

Role Machine Learning Engineer Affiliation Algoverse AI Research Email maheepchaudhary.research@gmail.com Links GitHub · Twitter · Scholar

Education

Aug 2023 — May 2024
Nanyang Technological University, Singapore
M.Sc. Artificial Intelligence, Computer Science
Aug 2018 — May 2022
Bundelkhand Institute of Engineering and Technology, India
B.E. Electronics and Communication Engineering

Experience

Jun 2025 — Present
Algoverse
Machine Learning Engineer
Jan 2025 — May 2025
AI Safety Camp
Machine Learning Researcher
Nov 2024 — May 2025
University of Oxford
Research Intern (Supervisor: Fazl Barez)
Jul 2024 — Sep 2024
WhiteBox Research
Research Mentor
Oct 2023 — Sep 2024
Pr(Ai)²R Group, Stanford
Alignment Research Engineer (Supervisor: Atticus Geiger)
Sep 2021 — Jan 2024
UIUC
Research Collaboration (Supervisor: Haohan Wang)
Oct 2019 — Mar 2021
IIT Indore
Research Collaboration (Supervisor: Chandresh Maurya)

Research Focus

Primary:

AI Safety Mechanistic Interpretability

Secondary:

Chain-of-Thought Causal Abstraction Deception Detection Attention Mechanisms

Methods:

White-box Analysis Activation Steering Probing Anomaly Detection Causal Intervention

Models:

Llama Gemma Qwen Mistral GPT-2

Metrics

5
Journal/Conference
7
Workshop Papers
6
Preprints
8
As Project Lead

Publications

Journal & Conference

Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability
A. Geiger, D. Ibeling, A. Zur, Maheep Chaudhary, S. Chauhan, J. Huang, A. Arora, Z. Wu, N. Goodman, C. Potts, T. Icard
JMLR 2024 [paper]
MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification
S. B. Shah, S. Shiwakoti, Maheep Chaudhary, H. Wang
EMNLP 2024 [paper]
An Intelligent Recommendation cum Reminder System
R. Saxena, Maheep Chaudhary, C.K. Maurya, S. Prasad
ACM CODS-COMAD 2022
CQFaRAD: Collaborative Query-Answering Framework for a Research Article Dataspace
M. Singh, S. Pandey, R. Saxena, Maheep Chaudhary, N. Lal
Springer IJIT 2021

Workshop Papers (NeurIPS 2025)

Evaluation Awareness Scales Predictably in Open-Weights LLMs
Maheep Chaudhary†, I. Su, N. Hooda, N. Shankar, J. Tan, K. Zhu, A. Panda, R. Lagasse, V. Sharma
Responsible FM LEAD
FRIT: Using Causal Importance to Improve Chain-of-Thought Faithfulness
A. Swaroop, A. Nallani, S. Uboweja, A. Uzdenova, M. Nguyen, K. Zhu, S. Dev, A. Panda, V. Sharma, Maheep Chaudhary†
FoRLM LEAD
Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis
A. More, A. Zhang, N. Bonilla, A. Vivekan, K. Zhu, P. Sharafoleslami, Maheep Chaudhary†
Responsible FM LEAD
Alignment-Constrained Dynamic Pruning for LLMs
D. Patel, G. Gervacio, D. Raimi, K. Zhu, R. Lagasse, G. Grand, A. Panda, Maheep Chaudhary†
Responsible FM LEAD
SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought
S. Batra, P. Tillman, S. Gaggar, S. Kesineni, S. Dev, K. Zhu, A. Panda, Maheep Chaudhary†
Responsible FM LEAD
Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization
N. Egbuna, S. Gaur, S. Dev, A. Panda, Maheep Chaudhary†
Efficient Reasoning LEAD
Hydra: A Modular Architecture for Efficient Long-Context Reasoning
S. Chaudhary, D. Patel, Maheep Chaudhary, B. Browning
Efficient Reasoning

Preprints

SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors
Maheep Chaudhary, F. Barez
LEAD [arXiv]
Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small
Maheep Chaudhary, A. Geiger
LEAD [arXiv]
PALADIN: Self-Correcting Language Model Agents to Cure Tool-Failure Cases
S. V. Vuddanti, A. Shah, S. K. Chittiprolu, T. Song, S. Dev, K. Zhu, Maheep Chaudhary†
LEAD
Modular Training of Neural Networks aids Interpretability
S. Golechha, Maheep Chaudhary, J. Velja, A. Abate, N. Schoots
[arXiv]
Punctuation and Predicates in Language Models
S. Chauhan, Maheep Chaudhary, K. Choy, S. Nellessen, N. Schoots
[arXiv]
Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey with Causality Perspectives
Maheep Chaudhary*, H. Liu*, H. Wang
Co-first [arXiv]

Key Findings

  • Evaluation awareness scales with model size — standard eval may fail for AGI-level systems
  • Harmful content produces distinct attention signatures — ~95% detection accuracy
  • Information flows left→right to last token — punctuation/predicates serve as storage anchors
  • Obfuscation shifts info from non-linear to linear spaces — white-box monitoring can detect this

Service

  • Reviewer: ICML 2025 (Actionable Interpretability), NeurIPS 2024 (MATH+AI, CALM)
  • Mentoring: UNESCO-India-Africa Program (20+ countries), Research Mentor at Algoverse

Awards

  • Winner — Smart India Hackathon (200K+ participants) — Face recognition for Bureau of Police R&D
  • Team Leader — ASEAN-India Hackathon (10+ countries) — Marine Species Detection
  • Selected Mentor — UNESCO-India-Africa Program (20+ countries) — Voice-assisted system for farmers

Contact