Hongyu Wu, Pengwan Yang, Yuki M. Asano, Cees G. M. Snoek
The paper introduces a new dataset and framework for segmenting any 3D part of a scene using natural language descriptions, overcoming data and annotation challenges and demonstrating superior performance in open-vocabulary 3D scene understanding.
This research tackles the challenge of identifying specific parts of 3D scenes based on natural language descriptions. Traditional methods focus on whole objects, but this approach allows for more detailed analysis by segmenting individual parts. The authors created a new dataset, 3D-PU, which provides detailed part-level annotations for 3D scenes. They also developed a framework called OpenPart3D that improves the ability to understand and segment 3D scenes at the part level, showing strong results across different datasets.