Hi-ToM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models
Yinghui He, Yufan Wu, Yilin Jia, Rada Mihalcea, Yulong Chen, and Naihao Deng
Download Paper
Findings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
“They don’t know that we know they know we know” 🤯 — Does GPT-4 have Higher-Order Theory of Mind? Introducing 👋 Hi-ToM: a benchmark pushing LLMs to their limits in higher-order ToM (3rd order & beyond). LLMs’ performance declines drastically to near 0 📉 on 3rd and 4th!