PaperPulse logo
FeedTopicsAI Researcher FeedBlogPodcastAccount

Stay Updated

Get the latest research delivered to your inbox

Platform

  • Home
  • About Us
  • Search Papers
  • Research Topics
  • Researcher Feed

Resources

  • Newsletter
  • Blog
  • Podcast
PaperPulse•

AI-powered research discovery platform

© 2024 PaperPulse. All rights reserved.

CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation

arXivSource

Hao Li, Shuai Yang, Yilun Chen, Yang Tian, Xiaoda Yang, Xinyi Chen, Hanqing Wang, Tai Wang, Feng Zhao, Dahua Lin, Jiangmiao Pang

cs.CV
|
Jun 24, 2025
6 views

One-line Summary

CronusVLA enhances vision-language-action models by efficiently incorporating multi-frame motion data, achieving state-of-the-art performance in manipulation tasks.

Plain-language Overview

CronusVLA is a new approach that improves how robots understand and act in their environment by using multiple frames of video data instead of just one. Traditional models that combine vision, language, and action have struggled with using multiple frames due to high computational costs. CronusVLA addresses this by introducing a method to efficiently process and use information from multiple frames, leading to better performance in tasks like object manipulation. This approach not only improves success rates in simulated environments but also demonstrates strong results in real-world experiments.

Technical Details