PA3FF — Part-Aware 3D Feature Field ICLR 2026

Learning Part-Aware Dense 3D Feature Field for Generalizable Articulated Object Manipulation

Yue Chen1*, Muqing Jiang1*, Kaifeng Zheng2*, Jiaqi Liang1, Chenrui Tie3, Haoran Lu1, Ruihai Wu1†, Hao Dong1†
1 Peking University  ·  2 Beijing Institute of Technology  ·  3 National University of Singapore
* Equal contribution  ·  Corresponding author
Accepted to ICLR 2026
§ 01 Abstract

Teaching robots to see parts, not just shapes.

Articulated object manipulation is essential for real-world robotic tasks, yet generalizing across diverse objects remains challenging. The key lies in understanding functional parts (e.g., handles, knobs) that indicate where and how to manipulate across diverse categories and shapes.

Previous approaches using 2D foundation features face critical limitations when lifted to 3D: long runtimes, multi-view inconsistencies, and low spatial resolution with insufficient geometric information.

We propose Part-Aware 3D Feature Field (PA3FF), a novel dense 3D representation with part awareness for generalizable manipulation. PA3FF is trained via contrastive learning on 3D part proposals from large-scale datasets. Given point clouds as input, it predicts continuous 3D feature fields in a feedforward manner, where feature proximity reflects functional part relationships.

Building on PA3FF, we introduce Part-Aware Diffusion Policy (PADP) for enhanced sample efficiency and generalization. PADP significantly outperforms existing 2D and 3D representations (CLIP, DINOv2, Grounded-SAM), achieving state-of-the-art performance on both simulated and real-world tasks.

§ 02 Overview

A feedforward pipeline from point clouds to manipulation policy.

PA3FF Overview
Figure 01
PA3FF Framework. We propose a feedforward model that predicts part-aware 3D feature fields, enabling generalizable manipulation across unseen objects. Our part-aware diffusion policy (PADP) achieves significant performance improvements with only 6.25% performance drop on unseen objects. PA3FF exhibits consistency across shapes, enabling downstream applications including correspondence learning and segmentation.
Four contributions.
  1. We introduce PA3FF, a 3D-native representation that encodes dense, semantic, and functional part-aware features directly from point clouds.
  2. We develop PADP, a diffusion policy that leverages PA3FF for generalizable manipulation with strong sample efficiency.
  3. PA3FF enables diverse downstream methods including correspondence learning and segmentation, making it a versatile foundation for robotic manipulation.
  4. We validate on 16 PartInstruct and 8 real-world tasks, outperforming prior 2D and 3D representations (CLIP, DINOv2, Grounded-SAM) by +15% and +16.5%.
§ 03 Methodology

Three stages of training, from geometry to policy.

Method Pipeline
Figure 02
Three-Stage Training Framework. (1) Geometric Pre-training — leverage 3D geometric priors from large-scale datasets. (2) Part-Aware Contrastive Learning — learn dense 3D feature fields that enhance part-level consistency. (3) Policy Learning — integrate refined features into a diffusion policy for generalizable manipulation.
01 · 3D-Native

Point-cloud first.

Directly processes point clouds, avoiding the inconsistencies of 2D multi-view lifting.

02 · Dense features

Continuous fields.

Predicts per-point features that capture fine-grained geometric details across the full surface.

03 · Efficient

Single pass, real-time.

One feedforward inference — suitable for real-time robotic control with minimal latency.

§ 04 Why PA3FF

Smooth, view-invariant, and discriminative where it matters.

Feature Comparison
Figure 03
Visual Comparison. PA3FF generates smooth, semantically consistent 3D feature fields. In contrast, 2D-lifting methods (like DINOv2) suffer from noise and inconsistency across views, while other 3D baselines (Sonata) lack discriminative detail for functional parts.
— 2D Lifting Limits
  • Inconsistency. Features fluctuate across different viewpoints.
  • Resolution. Small/thin parts (handles) are often lost in 2D renderings.
  • Latency. Multi-view fusion is computationally expensive.
+ PA3FF Advantages
  • Consistency. 3D-native prediction ensures viewpoint invariance.
  • Precision. Dense per-point features capture fine geometric details.
  • Speed. Feedforward network enables real-time interaction.
§ 05 Results

Experiments — simulation, real-world, ablation.

Real-World Results
Table 02
Real-world manipulation success rates across 8 diverse tasks.
State-of-the-art

PADP achieves 58.8% success on unseen objects — outperforming the best baseline (GenDP) by +23.8 pts, effectively closing the sim-to-real gap.

Success · Unseen
58.8%
Baseline · GenDP
35.0%
Δ Absolute improvement
+23.8pts
Simulation Results
Figure 04
Success rates on PartInstruct benchmark across 5 generalization levels.
Key insight

PADP averages 28.8% success vs. GenDP's 19.4% — a margin of +9.4 pts. The biggest jump is on novel object categories.

Protocol
OSObject States  pose / rotation
OIObject Instances  same category
TPTask Parts  new parts
TCTask Categories  new tasks
OCObject Categories  unseen class

Component analysis.

Full · PADP
62%
w/o Feature refine
46%
Sonata + DP3
39%
Ablation takeaway

Feature refinement via contrastive learning provides the largest performance gain (+16 pts) — confirming that part-aware representation learning is the critical component.

§ 06 Demos

Eight real-world tasks, one policy.

Task Illustrations
Figure 05
Task illustrations. Eight articulated-object manipulation tasks spanning pulling, opening, closing, pressing, and placement.
Pulling Lid of Pot
Opening Drawer
Closing Box
Closing Laptop
Opening Microwave
Opening Bottle
Lid on Kettle
Pressing Dispenser
§ 07 Applications

Beyond policies — correspondence & segmentation.

Downstream Applications
Figure 06
3D Shape Correspondences. PA3FF enables precise cross-shape correspondences using Functional Maps, robust to topology changes.
Semantic Heatmaps
Figure 07
Instruction Attention. Heatmaps show cosine similarity between text instructions and learned 3D features.
Segmentation Results
Figure 08
Part Segmentation. Superior zero-shot segmentation performance (mAP50 70.6%) compared to PartSlip++.