# RESEARCHER_PROFILE

name:        "Maheep Chaudhary"
role:        "Machine Learning Engineer"
affiliation: "Algoverse AI Research" [link](https://algoverseairesearch.org)
email:       maheepchaudhary.research@gmail.com
links:       [GitHub](https://github.com/MaheepChaudhary) [Twitter](https://twitter.com/MaheepChaudhary) [Scholar](https://scholar.google.com/citations?user=YOUR_ID)

## EDUCATION

- Aug 2023 — May 2024
  Nanyang Technological University, Singapore
  M.Sc. Artificial Intelligence, Computer Science

- Aug 2018 — May 2022
  Bundelkhand Institute of Engineering and Technology, India
  B.E. Electronics and Communication Engineering

## EXPERIENCE

- Jun 2025 — Present  Algoverse
  → Machine Learning Engineer

- Jan 2025 — May 2025  AI Safety Camp
  → Machine Learning Researcher

- Nov 2024 — May 2025  University of Oxford
  → Research Intern (Supervisor: Fazl Barez)

- Jul 2024 — Sep 2024  WhiteBox Research
  → Research Mentor

- Oct 2023 — Sep 2024  Pr(Ai)²R Group, Stanford
  → Alignment Research Engineer (Supervisor: Atticus Geiger)

- Sep 2021 — Jan 2024  UIUC
  → Research Collaboration (Supervisor: Haohan Wang)

- Oct 2019 — Mar 2021  IIT Indore
  → Research Collaboration (Supervisor: Chandresh Maurya)

## RESEARCH_TAXONOMY

primary_focus:
  [AI Safety] [Mechanistic Interpretability]

secondary:
  [Chain-of-Thought] [Causal Abstraction]
  [Deception Detection] [Attention Mechanisms]

methods:
  [White-box Analysis] [Activation Steering]
  [Probing] [Anomaly Detection] [Causal Intervention]

models:
  [Llama] [Gemma] [Qwen] [Mistral] [GPT-2]

## METRICS

journal_conference:  5
workshop_papers:     7
preprints:           6
as_project_lead:     8

## PUBLICATIONS

### Journal & Conference

1. "Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability"
   A. Geiger, D. Ibeling, A. Zur, Maheep Chaudhary, et al.
   [JMLR 2024] [paper](http://jmlr.org/papers/v26/23-0058.html)

2. "MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification"
   S. B. Shah, S. Shiwakoti, Maheep Chaudhary, H. Wang
   [EMNLP 2024] [paper](https://aclanthology.org/2024.emnlp-main.959/)

3. "An Intelligent Recommendation cum Reminder System"
   R. Saxena, Maheep Chaudhary, C.K. Maurya, S. Prasad
   [ACM CODS-COMAD 2022]

4. "CQFaRAD: Collaborative Query-Answering Framework"
   M. Singh, S. Pandey, R. Saxena, Maheep Chaudhary, N. Lal
   [Springer IJIT 2021]

### Workshop Papers (NeurIPS 2025)

1. "Evaluation Awareness Scales Predictably in Open-Weights LLMs"
   Maheep Chaudhary†, I. Su, N. Hooda, et al.
   [Responsible FM] LEAD

2. "FRIT: Using Causal Importance to Improve CoT Faithfulness"
   A. Swaroop, et al., Maheep Chaudhary†
   [FoRLM] LEAD

3. "Optimizing CoT Confidence via Topological and Dirichlet Risk Analysis"
   A. More, et al., Maheep Chaudhary†
   [Responsible FM] LEAD

4. "Alignment-Constrained Dynamic Pruning for LLMs"
   D. Patel, et al., Maheep Chaudhary†
   [Responsible FM] LEAD

5. "SALT: Steering Activations towards Leakage-free Thinking"
   S. Batra, et al., Maheep Chaudhary†
   [Responsible FM] LEAD

6. "Amortized Latent Steering: Low-Cost Test-Time Optimization"
   N. Egbuna, et al., Maheep Chaudhary†
   [Efficient Reasoning] LEAD

7. "Hydra: A Modular Architecture for Efficient Long-Context Reasoning"
   S. Chaudhary, D. Patel, Maheep Chaudhary, B. Browning
   [Efficient Reasoning]

### Preprints

1. "SafetyNet: Detecting Harmful Outputs in LLMs"
   Maheep Chaudhary, F. Barez
   LEAD [arXiv](https://arxiv.org/abs/2505.14300)

2. "Evaluating Sparse Autoencoders on Disentangling Factual Knowledge"
   Maheep Chaudhary, A. Geiger
   LEAD [arXiv](https://arxiv.org/abs/2409.04478)

3. "PALADIN: Self-Correcting LM Agents to Cure Tool-Failure Cases"
   S. V. Vuddanti, et al., Maheep Chaudhary†
   LEAD

4. "Modular Training of Neural Networks aids Interpretability"
   S. Golechha, Maheep Chaudhary, et al.
   [arXiv](https://arxiv.org/abs/2502.02470)

5. "Punctuation and Predicates in Language Models"
   S. Chauhan, Maheep Chaudhary, et al.
   [arXiv](https://arxiv.org/abs/2504.05110)

6. "Towards Trustworthy ML: A Data-centric Survey with Causality Perspectives"
   Maheep Chaudhary*, H. Liu*, H. Wang
   Co-first [arXiv](https://arxiv.org/abs/2307.16851)

## KEY_FINDINGS

[1] Evaluation awareness scales with model size
    → standard eval may fail for AGI-level systems

[2] Harmful content produces distinct attention signatures
    → ~95% detection accuracy

[3] Information flows left→right to last token
    → punctuation/predicates serve as storage anchors

[4] Obfuscation shifts info from non-linear to linear spaces
    → white-box monitoring can detect this

## SERVICE

reviewer:
  - ICML 2025 (Actionable Interpretability)
  - NeurIPS 2024 (MATH+AI, CALM)

mentoring:
  - UNESCO-India-Africa Program (20+ countries)
  - Research Mentor at Algoverse

## AWARDS

- Winner — Smart India Hackathon (200K+ participants)
  Face recognition for Bureau of Police R&D

- Team Leader — ASEAN-India Hackathon (10+ countries)
  Marine Species Detection

- Selected Mentor — UNESCO-India-Africa Program (20+ countries)
  Voice-assisted system for farmers

## CONTACT

email:   maheepchaudhary.research@gmail.com
github:  [GitHub](https://github.com/MaheepChaudhary)
twitter: [Twitter](https://twitter.com/MaheepChaudhary)
scholar: [Google Scholar](https://scholar.google.com/citations?user=YOUR_ID)

## STRUCTURED_DATA

{
  "@type": "Person",
  "name": "Maheep Chaudhary",
  "role": "Machine Learning Engineer",
  "affiliation": "Algoverse AI Research",
  "education": ["M.Sc. AI (NTU)", "B.E. ECE (BIET)"],
  "research_focus": ["AI Safety", "Mechanistic Interpretability"],
  "publications": {
    "journal_conference": 5,
    "workshop": 7,
    "preprints": 6,
    "as_lead": 8
  },
  "venues": ["JMLR", "EMNLP", "NeurIPS", "ACM", "Springer"]
}

Maheep Chaudhary

Role Machine Learning Engineer Affiliation Algoverse AI Research Email maheepchaudhary.research@gmail.com Links GitHub · Twitter · Scholar

Education

Aug 2023 — May 2024

Nanyang Technological University, Singapore

M.Sc. Artificial Intelligence, Computer Science

Aug 2018 — May 2022

Bundelkhand Institute of Engineering and Technology, India

B.E. Electronics and Communication Engineering

Experience

Jun 2025 — Present

Algoverse

Machine Learning Engineer

Jan 2025 — May 2025

AI Safety Camp

Machine Learning Researcher

Nov 2024 — May 2025

University of Oxford

Research Intern (Supervisor: Fazl Barez)

Jul 2024 — Sep 2024

WhiteBox Research

Research Mentor

Oct 2023 — Sep 2024

Pr(Ai)²R Group, Stanford

Alignment Research Engineer (Supervisor: Atticus Geiger)

Sep 2021 — Jan 2024

UIUC

Research Collaboration (Supervisor: Haohan Wang)

Oct 2019 — Mar 2021

IIT Indore

Research Collaboration (Supervisor: Chandresh Maurya)

Research Focus

Primary:

AI Safety Mechanistic Interpretability

Secondary:

Chain-of-Thought Causal Abstraction Deception Detection Attention Mechanisms

Methods:

White-box Analysis Activation Steering Probing Anomaly Detection Causal Intervention

Models:

Llama Gemma Qwen Mistral GPT-2

Metrics

Journal/Conference

Workshop Papers

Preprints

As Project Lead

Publications

Journal & Conference

Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability

A. Geiger, D. Ibeling, A. Zur, Maheep Chaudhary, S. Chauhan, J. Huang, A. Arora, Z. Wu, N. Goodman, C. Potts, T. Icard

JMLR 2024 [paper]

MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification

S. B. Shah, S. Shiwakoti, Maheep Chaudhary, H. Wang

EMNLP 2024 [paper]

An Intelligent Recommendation cum Reminder System

R. Saxena, Maheep Chaudhary, C.K. Maurya, S. Prasad

ACM CODS-COMAD 2022

CQFaRAD: Collaborative Query-Answering Framework for a Research Article Dataspace

M. Singh, S. Pandey, R. Saxena, Maheep Chaudhary, N. Lal

Springer IJIT 2021

Workshop Papers (NeurIPS 2025)

Evaluation Awareness Scales Predictably in Open-Weights LLMs

Maheep Chaudhary†, I. Su, N. Hooda, N. Shankar, J. Tan, K. Zhu, A. Panda, R. Lagasse, V. Sharma

Responsible FM LEAD

FRIT: Using Causal Importance to Improve Chain-of-Thought Faithfulness

A. Swaroop, A. Nallani, S. Uboweja, A. Uzdenova, M. Nguyen, K. Zhu, S. Dev, A. Panda, V. Sharma, Maheep Chaudhary†

FoRLM LEAD

Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis

A. More, A. Zhang, N. Bonilla, A. Vivekan, K. Zhu, P. Sharafoleslami, Maheep Chaudhary†

Responsible FM LEAD

Alignment-Constrained Dynamic Pruning for LLMs

D. Patel, G. Gervacio, D. Raimi, K. Zhu, R. Lagasse, G. Grand, A. Panda, Maheep Chaudhary†

Responsible FM LEAD

SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought

S. Batra, P. Tillman, S. Gaggar, S. Kesineni, S. Dev, K. Zhu, A. Panda, Maheep Chaudhary†

Responsible FM LEAD

Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization

N. Egbuna, S. Gaur, S. Dev, A. Panda, Maheep Chaudhary†

Efficient Reasoning LEAD

Hydra: A Modular Architecture for Efficient Long-Context Reasoning

S. Chaudhary, D. Patel, Maheep Chaudhary, B. Browning

Efficient Reasoning

Preprints

SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors

Maheep Chaudhary, F. Barez

LEAD [arXiv]

Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small

Maheep Chaudhary, A. Geiger

LEAD [arXiv]

PALADIN: Self-Correcting Language Model Agents to Cure Tool-Failure Cases

S. V. Vuddanti, A. Shah, S. K. Chittiprolu, T. Song, S. Dev, K. Zhu, Maheep Chaudhary†

LEAD

Modular Training of Neural Networks aids Interpretability

S. Golechha, Maheep Chaudhary, J. Velja, A. Abate, N. Schoots

[arXiv]

Punctuation and Predicates in Language Models

S. Chauhan, Maheep Chaudhary, K. Choy, S. Nellessen, N. Schoots

[arXiv]

Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey with Causality Perspectives

Maheep Chaudhary*, H. Liu*, H. Wang

Co-first [arXiv]

Key Findings

Evaluation awareness scales with model size — standard eval may fail for AGI-level systems
Harmful content produces distinct attention signatures — ~95% detection accuracy
Information flows left→right to last token — punctuation/predicates serve as storage anchors
Obfuscation shifts info from non-linear to linear spaces — white-box monitoring can detect this

Service

Reviewer: ICML 2025 (Actionable Interpretability), NeurIPS 2024 (MATH+AI, CALM)
Mentoring: UNESCO-India-Africa Program (20+ countries), Research Mentor at Algoverse

Awards

Winner — Smart India Hackathon (200K+ participants) — Face recognition for Bureau of Police R&D
Team Leader — ASEAN-India Hackathon (10+ countries) — Marine Species Detection
Selected Mentor — UNESCO-India-Africa Program (20+ countries) — Voice-assisted system for farmers

Contact

Email maheepchaudhary.research@gmail.com GitHub github.com/MaheepChaudhary Twitter @MaheepChaudhary Scholar Google Scholar