Differentiable GPU-Parallelized Task and Motion Planning
William Shen1,2, Caelan Garrett2, Nishanth Kumar1,2, Ankit Goyal2, Tucker Hermans2,3,
Leslie Pack Kaelbling1, Tomás Lozano-Pérez1, Fabio Ramos2,4
1MIT CSAIL, 2NVIDIA Research, 3University of Utah, 4University of Sydney
Robotics: Science and Systems (RSS), 2025
Table of Contents:
Pre-Requisites:
- cuTAMP depends on cuRobo, which has specific hardware requirements
- GPU: NVIDIA GPU with Volta or newer architecture
- Python: 3.10+ (we've only tested with Python 3.10)
- PyTorch: 2.0+ is recommended
# We use conda, but feel free to use your favorite Python environment manager
conda create --name cutamp python=3.10 -y
conda activate cutamp
You must install a PyTorch version that is built for a CUDA version less than or equal to your CUDA Toolkit version. This ensures that cuRobo compiles in the next step. To check your CUDA Toolkit version, run:
# Look for something like "release 12.6"
nvcc --version
If you don't have the CUDA Toolkit installed, install it via the official NVIDIA installer: https://developer.nvidia.com/cuda-downloads
✅ Pick a version that's supported by your GPU drivers.
⚠️ Don't overwrite your drivers unless you're sure, as it could cause issues with your system.
Then, install PyTorch using the command provided on the PyTorch website. You can also install an older version of PyTorch if it better matches your CUDA version:
# Example for latest PyTorch with CUDA 12.6
pip install torch torchvision torchaudio
git clone https://github.com/NVlabs/cuTAMP.git
cd cuTAMP
pip install -e .
Before cloning cuRobo, make sure git-lfs
is installed (used for pulling large assets).
sudo apt install git-lfs
git lfs install
Then clone and install cuRobo:
git clone https://github.com/NVlabs/curobo.git
cd curobo
# This can take up to 20 minutes to install
pip install -e . --no-build-isolation
# Optional: Verify that all unit tests pass
pip install pytest
python -m pytest .
cd ..
For full cuRobo installation instructions, see: https://curobo.org/get_started/1_install_instructions.html
Once installed, you can run the default demo using:
cutamp-demo
This runs the cutamp/scripts/run_cutamp.py script with the default parameters on the Tetris environment with 3 blocks. We use Rerun to visualize the optimization and plan.
- If you're on a machine with a display, you're now good to go!
- If you're on a remote or headless machine (i.e., no display), see the instructions below on how to run with or without the visualizer.
Toggle between different timelines in the Rerun visualizer to see different aspects of the optimization and planning. For a general guide on how to use Rerun, see this guide
If you're running cuTAMP on a remote server without a display, you have two options:
- Forward the Rerun visualizer port to your local machine via SSH (see below), or
- Disable the visualizer using the
--disable_visualizer
flag:cutamp-demo --disable_visualizer
Reverse Tunnel Setup
- On your local machine (e.g., laptop), install Rerun and start the viewer:
Note the TCP port shown in the top right of the viewer (usually
pip install rerun-sdk rerun
9876
).💡 Try to match the
rerun-sdk
version on your local and remote machines to avoid compatibility issues.
You can check the version on the remote machine with:pip show rerun-sdk
- Create a reverse SSH tunnel from your local machine to the remote server:
ssh -R 9876:127.0.0.1:9876 username@server
- On the remote machine, run the demo:
The visualizer will connect and stream to your local Rerun viewer through the tunnel!
cutamp-demo
The cutamp-demo
command runs the cutamp/scripts/run_cutamp.py
script with a number
of useful options.
⚠️ This script exposes only a subset of the functionality of cuTAMP. For more advanced usage, please refer to the source code directly.
To view the available options, run:
cutamp-demo -h
The Tetris domain has 1, 2, 3, and 5 block variants named tetris_{1,2,3,5}
. The tetris_5
variant is the most
challenging and benefits from cost tuning and increasing the number of particles.
# Tetris packing with 3 blocks and motion planning after cuTAMP solve
# All plan skeletons are downward refinable, so 1 initial plan is sufficient
cutamp-demo --env tetris_3 --num_initial_plans 1 --motion_plan
# Tetris packing with 5 blocks and more particles and optimization steps
cutamp-demo --env tetris_5 --num_particles 2048 --num_opt_steps 2000 \
--num_initial_plans 1 --motion_plan
# Tetris packing with 5 blocks and tuned cost weights
# You can try the --tuned_tetris_weights flag on other problems too (it works)!
cutamp-demo --env tetris_5 --num_particles 2048 --num_opt_steps 2000 \
--num_initial_plans 1 --motion_plan --tuned_tetris_weights
# Minimize the distance between the objects for 10 seconds
cutamp-demo --env blocks --optimize_soft_cost --soft_cost min_obj_dist --max_duration 10
# Maximize the distance between the objects for 10 seconds
cutamp-demo --env blocks --optimize_soft_cost --soft_cost max_obj_dist --max_duration 10
In the Stick Button domain, enabling subgraph caching speeds up particle initialization across plan skeletons.
# Stick button domain with Franka Panda
cutamp-demo --env stick_button --robot panda --num_initial_plans 100 --cache_subgraphs
# Stick button domain with UR5. The UR5 doesn't need to use the stick.
# Cross-embodiment generalization!
cutamp-demo --env stick_button --robot ur5 --num_initial_plans 100 --cache_subgraphs
--disable_visualizer
: disable the rerun visualizer. This is useful for benchmarking or headless runs.--viz_interval
: control how often the visualizer updates (default is 10). Increase to reduce visualization overhead and network bandwidth usage.--disable_robot_mesh
: skip robot mesh rendering (saves bandwidth and load time when visualizing remotely).
If you encounter any issues not covered below, please open an issue. Make sure to describe your setup and detail the problem you're facing.
torch.OutOfMemoryError: CUDA out of memory...
Try reducing the number of particles (the default is 1024):
cutamp-demo --num_particles 256
On some systems, especially older Linux distros, the rerun-sdk
wheel may not be availabe on PyPI.
Solution: try installing via conda
via the conda-forge
channel. See the instructions here: https://rerun.io/docs/getting-started/installing-viewer#python
If you see an error like:
A module that was compiled using NumPy 1.x cannot be run in NumPy 2.2.6...
This is due to a known compatibility issue between newer numpy versions and extensions built against older numpy ABI.
Fix: Downgrade numpy to a 1.x version:
pip install "numpy<2"
You can try installing GLIBCXX
via conda:
conda install -c conda-forge libstdcxx-ng -y
If you've created a new placement surface, make sure you set the tolerance appropriately here for the surface name: cutamp/scripts/utils.py.
Additionally, check the logs and analyze which constraints have been violated. Try loosening the threshold for those constraints to debug.
We thank Balakumar Sundaralingam for his extensive support with using and debugging cuRobo.
If you use cuTAMP in your research, please consider citing our paper:
@inproceedings{shen2025cutamp,
title={Differentiable GPU-Parallelized Task and Motion Planning},
author={Shen, William and Garrett, Caelan and Kumar, Nishanth and Goyal, Ankit and Hermans, Tucker and Kaelbling, Leslie Pack and Lozano-P{\'e}rez, Tom{\'a}s and Ramos, Fabio},
booktitle={Robotics: Science and Systems},
year={2025}
}