NOD-TAMP: Generalizable Long-Horizon Planning with Neural Object Descriptors

Shuo Cheng1    Caelan Garrett*2    Ajay Mandlekar*2    Danfei Xu1   

1Georgia Institute of Technology       2NVIDIA Corporation


Real-world Results

NOD-TAMP solves long-horizon tasks using just 1 demonstration per skill, and generalizes to diverse object shapes and spatial configurations zero-shot.

With just 1 demonstration, NOD-TAMP solves fine-grained tasks (slot diameter < 1 cm).

NOD-TAMP solves long-horizon tasks that require fine-grained motions (e.g., insert coffee pod) with just 1 demonstration for the full task.

NOD-TAMP re-purposes skills from other tasks to solve new problems (e.g., reuse place mug skill from Make Coffee task).

NOD-TAMP performs geometric reasoning for differentiating skills (e.g., grasp mug with handle or rim) to reach different goals.

NOD-TAMP uses geometric reasoning on grasping strategies (e.g., pick up tool with junction or handle) to achieve different tool usage.



Method Overview

pipeline image


Skill Reasoning Visualization

reasoning image


Simulation Results

By extracting EIGHT skills from FOUR demos showing how to manipulate just ONE mug, frame, and tool instance, NOD-TAMP solves hundreds of tasks with diverse shapes, configurations, and task goals.

skill image

Visualization of planning and task execution.

>

BibTeX

@misc{cheng2023nodtamp,
        title={NOD-TAMP: Generalizable Long-Horizon Planning with Neural Object Descriptors},
        author={Shuo Cheng and Caelan Garrett and Ajay Mandlekar and Danfei Xu},
        year={2023},
        eprint={2311.01530},
        archivePrefix={arXiv},
        primaryClass={cs.RO}
        }