Publications

You can also find my articles on my Google Scholar profile.

Conference/Workshop Papers


AdaptMI: Adaptive Skill-based In-context Math Instruction for Small Language Models

Yinghui He, Abhishek Panigrahi, Yong Lin, Sanjeev Arora
Download Paper

COLM 2025; ICML 2025 Workshop on Test-Time Adaptation; ICML 2025 Methods and Opportunities at Small Scale Workshop, 2025

Kids improve when a good teacher offers adaptive, targeted feedback. Can a small LLM benefit if a large LLM provide helpful feedback, in-context?? Naive ideas fail here. We propose AdaptMI: adaptive, skill-based in-context supervision that boosts 1B models by 6% on challenging math tasks.

EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety

Jiahao Qiu*, Yinghui He*, Xinzhe Juan*, Yiming Wang, Yuhan Liu, Zixin Yao, Yue Wu, Xun Jiang, Ling Yang, Mengdi Wang
Download Paper

EMNLP 2025 Main Conference, 2025

Can AI Be Blamed for a Teen’s Suicide? Do AI Chatbots encourage suicide? 🧒📱What if your teen’s favorite AI character crossed the line? 💔 A 14-year-old boy in Florida took his own life after forming a deep bond with an AI character on http://Character.AI. The AI chatbot — modeled after a Game of Thrones persona — reportedly discussed his suicidal thoughts and encouraged these dangerous ideas. ⚠️AI can help, but unfortunately, it can also harm.

LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation

Xi Ye, Fangcong Yin*, Yinghui He*, Joie Zhang*, Howard Yen*, Tianyu Gao, Greg Durrett, Danqi Chen
Download Paper

COLM 2025, 2025

“🤔Now most LLMs have >= 128K context sizes, but are they good at generating long outputs, such as writing 8K token chain-of-thought for a planning problem? 🔔Introducing LongProc (Long Procedural Generation), a new benchmark with 6 diverse tasks that challenge LLMs to synthesize highly dispersed information and generate long, structured outputs.

Hi-ToM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models

Yinghui He, Yufan Wu, Yilin Jia, Rada Mihalcea, Yulong Chen, and Naihao Deng
Download Paper

Findings of EMNLP 2023; ICML 2023 Workshop on Theory of Mind in Communicating Agents, 2023

“They don’t know that we know they know we know” 🤯 — Does GPT-4 have Higher-Order Theory of Mind? Introducing 👋 Hi-ToM: a benchmark pushing LLMs to their limits in higher-order ToM (3rd order & beyond). LLMs’ performance declines drastically to near 0 📉 on 3rd and 4th!