Publications

You can also find my articles on my Google Scholar profile.

Conference/Workshop Papers


LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation

Xi Ye, Fangcong Yin*, Yinghui He*, Joie Zhang*, Howard Yen*, Tianyu Gao, Greg Durrett, Danqi Chen
Download Paper

arxiv preprint, 2025

“🤔Now most LLMs have >= 128K context sizes, but are they good at generating long outputs, such as writing 8K token chain-of-thought for a planning problem? 🔔Introducing LongProc (Long Procedural Generation), a new benchmark with 6 diverse tasks that challenge LLMs to synthesize highly dispersed information and generate long, structured outputs.

Hi-ToM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models

Yinghui He, Yufan Wu, Yilin Jia, Rada Mihalcea, Yulong Chen, and Naihao Deng
Download Paper

Findings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

“They don’t know that we know they know we know” 🤯 — Does GPT-4 have Higher-Order Theory of Mind? Introducing 👋 Hi-ToM: a benchmark pushing LLMs to their limits in higher-order ToM (3rd order & beyond). LLMs’ performance declines drastically to near 0 📉 on 3rd and 4th!