PaperPulse logo
FeedTopicsAI Researcher FeedBlogPodcastAccount

Stay Updated

Get the latest research delivered to your inbox

Platform

  • Home
  • About Us
  • Search Papers
  • Research Topics
  • Researcher Feed

Resources

  • Newsletter
  • Blog
  • Podcast
PaperPulse•

AI-powered research discovery platform

© 2024 PaperPulse. All rights reserved.

TextAtari: 100K Frames Game Playing with Language Agents

arXivSource

Wenhao Li, Wenwu Li, Chuyun Shen, Junjie Sheng, Zixiao Huang, Di Wu, Yun Hua, Wei Yin, Xiangfeng Wang, Hongyuan Zha, Bo Jin

cs.AI
|
Jun 4, 2025
1 views

One-line Summary

TextAtari is a benchmark for testing language agents on long-horizon decision-making tasks using textual descriptions of Atari games, revealing significant performance gaps compared to human players.

Plain-language Overview

TextAtari is a new benchmark designed to test how well language-based AI agents can play classic Atari games when the games are described in text rather than visuals. This benchmark involves nearly 100 different games and challenges the AI to make decisions over very long periods, up to 100,000 steps. The study evaluated several large language models using different strategies to see how they perform in these text-based game scenarios. The results showed that these AI agents struggle significantly compared to human players, especially in tasks that require complex planning and understanding over many moves.

Technical Details