🔬 Current Research
Working with Fazl Barez at University of Oxford on whitebox monitoring of LLMs, and mentoring research projects at Algoverse.
Previously, I have collaborated with Atticus Geiger and Nandi Schoots, University of Oxford on mechanistic (causal) interpretability; and Haohan Wang at UIUC on trustworthy machine learning, and completed my master's at NTU Singapore. Before diving deep into AI safety research, I won the Smart India Hackathon with 200K+ participants and led international teams solving real-world AI problems.
In my life's causal DAG, my mentors are parent nodes of each success 🙏🏻.
💬 Let's Collaborate
✉️ Get in TouchOpen to collaborations in: model eval awareness, mechanistic interpretability, AI safety
🔬 Research
I research AI safety and interpretability to ensure advanced AI systems are aligned with human values. This involves mechanistic interpretability, deception detection, and causal methods for understanding neural networks.
📑 Literature Surveys
📚 Additional Publications
🏆 Background & Recognition
🥇 Smart India Hackathon Winner
World's largest hackathon with 200K+ participants. Developed facial recognition systems for criminal identification.
🌏 ASEAN-India Hackathon Leader
Led international teams across 10+ countries, focusing on marine species detection using AI.
Beyond research, I'm passionate about nurturing the next generation of AI researchers:
- Mentored over 40+ students across multiple research programs and institutions.
- Selected as mentor for UNESCO-India-Africa Program spanning 20+ countries.
- Reviewed for ICML 2025 Workshop and multiple NeurIPS 2024 workshops.
💡 My Research Philosophy
I believe powerful AI systems must be interpretable and aligned with human values. My work focuses on understanding the internal mechanisms of neural networks and developing practical methods to detect when they behave in unintended ways. From theoretical foundations to deployed safety systems, I work across the full spectrum of AI alignment research.