
# RESEARCHER_PROFILE name: "Maheep Chaudhary" role: "Machine Learning Engineer" affiliation: "Algoverse AI Research" [link](https://algoverseairesearch.org) email: maheepchaudhary.research@gmail.com links: [GitHub](https://github.com/MaheepChaudhary) [Twitter](https://twitter.com/MaheepChaudhary) [Scholar](https://scholar.google.com/citations?user=YOUR_ID)
## EDUCATION - Aug 2023 — May 2024 Nanyang Technological University, Singapore M.Sc. Artificial Intelligence, Computer Science - Aug 2018 — May 2022 Bundelkhand Institute of Engineering and Technology, India B.E. Electronics and Communication Engineering
## EXPERIENCE - Jun 2025 — Present Algoverse → Machine Learning Engineer - Jan 2025 — May 2025 AI Safety Camp → Machine Learning Researcher - Nov 2024 — May 2025 University of Oxford → Research Intern (Supervisor: Fazl Barez) - Jul 2024 — Sep 2024 WhiteBox Research → Research Mentor - Oct 2023 — Sep 2024 Pr(Ai)²R Group, Stanford → Alignment Research Engineer (Supervisor: Atticus Geiger) - Sep 2021 — Jan 2024 UIUC → Research Collaboration (Supervisor: Haohan Wang) - Oct 2019 — Mar 2021 IIT Indore → Research Collaboration (Supervisor: Chandresh Maurya)
## RESEARCH_TAXONOMY primary_focus: [AI Safety] [Mechanistic Interpretability] secondary: [Chain-of-Thought] [Causal Abstraction] [Deception Detection] [Attention Mechanisms] methods: [White-box Analysis] [Activation Steering] [Probing] [Anomaly Detection] [Causal Intervention] models: [Llama] [Gemma] [Qwen] [Mistral] [GPT-2]
## METRICS journal_conference: 5 workshop_papers: 7 preprints: 6 as_project_lead: 8
## PUBLICATIONS ### Journal & Conference 1. "Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability" A. Geiger, D. Ibeling, A. Zur, Maheep Chaudhary, et al. [JMLR 2024] [paper](http://jmlr.org/papers/v26/23-0058.html) 2. "MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification" S. B. Shah, S. Shiwakoti, Maheep Chaudhary, H. Wang [EMNLP 2024] [paper](https://aclanthology.org/2024.emnlp-main.959/) 3. "An Intelligent Recommendation cum Reminder System" R. Saxena, Maheep Chaudhary, C.K. Maurya, S. Prasad [ACM CODS-COMAD 2022] 4. "CQFaRAD: Collaborative Query-Answering Framework" M. Singh, S. Pandey, R. Saxena, Maheep Chaudhary, N. Lal [Springer IJIT 2021]
### Workshop Papers (NeurIPS 2025) 1. "Evaluation Awareness Scales Predictably in Open-Weights LLMs" Maheep Chaudhary†, I. Su, N. Hooda, et al. [Responsible FM] LEAD 2. "FRIT: Using Causal Importance to Improve CoT Faithfulness" A. Swaroop, et al., Maheep Chaudhary† [FoRLM] LEAD 3. "Optimizing CoT Confidence via Topological and Dirichlet Risk Analysis" A. More, et al., Maheep Chaudhary† [Responsible FM] LEAD 4. "Alignment-Constrained Dynamic Pruning for LLMs" D. Patel, et al., Maheep Chaudhary† [Responsible FM] LEAD 5. "SALT: Steering Activations towards Leakage-free Thinking" S. Batra, et al., Maheep Chaudhary† [Responsible FM] LEAD 6. "Amortized Latent Steering: Low-Cost Test-Time Optimization" N. Egbuna, et al., Maheep Chaudhary† [Efficient Reasoning] LEAD 7. "Hydra: A Modular Architecture for Efficient Long-Context Reasoning" S. Chaudhary, D. Patel, Maheep Chaudhary, B. Browning [Efficient Reasoning]
### Preprints 1. "SafetyNet: Detecting Harmful Outputs in LLMs" Maheep Chaudhary, F. Barez LEAD [arXiv](https://arxiv.org/abs/2505.14300) 2. "Evaluating Sparse Autoencoders on Disentangling Factual Knowledge" Maheep Chaudhary, A. Geiger LEAD [arXiv](https://arxiv.org/abs/2409.04478) 3. "PALADIN: Self-Correcting LM Agents to Cure Tool-Failure Cases" S. V. Vuddanti, et al., Maheep Chaudhary† LEAD 4. "Modular Training of Neural Networks aids Interpretability" S. Golechha, Maheep Chaudhary, et al. [arXiv](https://arxiv.org/abs/2502.02470) 5. "Punctuation and Predicates in Language Models" S. Chauhan, Maheep Chaudhary, et al. [arXiv](https://arxiv.org/abs/2504.05110) 6. "Towards Trustworthy ML: A Data-centric Survey with Causality Perspectives" Maheep Chaudhary*, H. Liu*, H. Wang Co-first [arXiv](https://arxiv.org/abs/2307.16851)
## KEY_FINDINGS
[1] Evaluation awareness scales with model size
→ standard eval may fail for AGI-level systems
[2] Harmful content produces distinct attention signatures
→ ~95% detection accuracy
[3] Information flows left→right to last token
→ punctuation/predicates serve as storage anchors
[4] Obfuscation shifts info from non-linear to linear spaces
→ white-box monitoring can detect this
## SERVICE reviewer: - ICML 2025 (Actionable Interpretability) - NeurIPS 2024 (MATH+AI, CALM) mentoring: - UNESCO-India-Africa Program (20+ countries) - Research Mentor at Algoverse
## AWARDS - Winner — Smart India Hackathon (200K+ participants) Face recognition for Bureau of Police R&D - Team Leader — ASEAN-India Hackathon (10+ countries) Marine Species Detection - Selected Mentor — UNESCO-India-Africa Program (20+ countries) Voice-assisted system for farmers
## CONTACT email: maheepchaudhary.research@gmail.com github: [GitHub](https://github.com/MaheepChaudhary) twitter: [Twitter](https://twitter.com/MaheepChaudhary) scholar: [Google Scholar](https://scholar.google.com/citations?user=YOUR_ID)
## STRUCTURED_DATA
{
"@type": "Person",
"name": "Maheep Chaudhary",
"role": "Machine Learning Engineer",
"affiliation": "Algoverse AI Research",
"education": ["M.Sc. AI (NTU)", "B.E. ECE (BIET)"],
"research_focus": ["AI Safety", "Mechanistic Interpretability"],
"publications": {
"journal_conference": 5,
"workshop": 7,
"preprints": 6,
"as_lead": 8
},
"venues": ["JMLR", "EMNLP", "NeurIPS", "ACM", "Springer"]
}

Primary:
AI Safety Mechanistic InterpretabilitySecondary:
Chain-of-Thought Causal Abstraction Deception Detection Attention MechanismsMethods:
White-box Analysis Activation Steering Probing Anomaly Detection Causal InterventionModels:
Llama Gemma Qwen Mistral GPT-2