Daniel Zhu

Daniel Zhu

San Francisco, CA

Hi!

I'm a researcher at Anthropic on the Deployment Robustness team. Previously, I was a ML Alignment and Theory Scholar (MATS), where I worked multi-agent misalignment.

If you'd like to connect, feel free to reach out!


A Brief History

I worked as a Data Scientist at Pendulum, where I developed LLM-powered product features, fine-tuned BERT models for high-risk community classification, designed embeddings for social media channels, and architected an adaptive data ingestion system.

I graduated from the University of Washington in 2021 with a BS in Computer Science and a minor in Entrepreneurship. During my time at UW, I co-founded VerbalEyes, a startup focused on making digital video content accessible through automated video narration. I was also an inaugural entrepreneurship fellow at Madrona Venture Labs.

Earlier on, I was a Software Development Engineer Intern at Amazon, where I implemented a distributed index for Amazon's largest data lake; AI Research Fellow at Giving Tech Labs, where I constructed a social impact funding knowledge graph; and, a Teaching Assistant for CSE 446 Machine Learning at UW.

Publications

May 2026 Jailbroken Frontier Models Retain Their Capabilities [arXiv]
Daniel Zhu, Zihan Wang, Xuchan Bao, Jerry Wei
Apr 2026 AI Organizations are More Effective but Less Aligned than Individual Agents [arXiv] [Blog]
Judy Hanwen Shen, Daniel Zhu, Siddarth Srinivasan, Henry Sleight, Lawrence T. Wagner III, Morgan Jane Matthews, Erik Jones, Jascha Sohl-Dickstein
Sep 2021 VerbalEyes: A Large-Scale Inquiry into the State of Audio Description [PDF]
L. Jiang, Daniel Zhu
Aug 2020 Domain Specific Knowledge Graphs as a Service to the Public: Powering Social-Impact Funding in the US [DOI]
Y. Li, V. Zakhozhyi, Daniel Zhu, L.J. Salazar

Journal

Dec 2024 TrainingPeaks Rewind: Visualizing 3.5 Years of My Triathlon Journey
Mar 2024 Mistral AI Hackathon: Finding French Politicians at Scale