Jiaxin Liu, Peiyi Tu, Wenyu Chen, Yihong Zhuang, Xinxia Ling, Anji Zhou, Chenxi Wang, Zhuo Han, Zhengkai Yang, Junbo Zhao, Zenan Huang, Yuanyuan Wang
HeartBench is a new framework for evaluating the anthropomorphic intelligence of Chinese LLMs, revealing significant limitations in their ability to handle complex social, emotional, and ethical nuances.
Large Language Models (LLMs) have made great strides in understanding and reasoning, but they struggle with complex social, emotional, and ethical situations. This problem is especially notable in the Chinese context due to a lack of proper evaluation tools and data. HeartBench is a new framework designed to test these abilities in Chinese LLMs, using real psychological counseling scenarios and expert collaboration. It shows that even advanced models only reach about 60% of the ideal performance, especially struggling with subtle emotional and ethical challenges.