[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
-
Updated
May 29, 2025 - Python
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
[NeurIPS 2025] AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
[CVPR 2025] The offical Implementation of "Universal Actions for Enhanced Embodied Foundation Models"
✨✨【NeurIPS 2025】Official implementation of BridgeVLA
🔥This is a curated list of "A survey on Efficient Vision-Language Action Models" research. We will continue to maintain and update the repository, so follow us to keep up with the latest developments!!!
Release of code, datasets and model for our work TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials
official repo for AGNOSTOS, a cross-task manipulation benchmark, and X-ICM method, a cross-task in-context manipulation (VLA) method
Official repo for From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models
🌐 A curated collection of vision-language-action (VLA) models for autonomous driving applications
Track 2: Social Navigation
[NeurIPS 2025] AGI-Elo: How Far Are We From Mastering A Task?
PickAgent: OpenVLA-powered Pick and Place Agent | Gradio&Simulation | Vision Language Action Model
VLAGen: Automated Data Collection for Generalizing Robotic Policies
Add a description, image, and links to the vision-language-action topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-action topic, visit your repo's landing page and select "manage topics."