Understanding 3D Object Articulation in Internet Videos

Shengyi Qian

Linyi Jin

Chris Rockwell

Siyi Chen

David F. Fouhey

University of Michigan

CVPR 2022

[pdf]

[code]

[video]

Given an ordinary video, our system produces a 3D planar representation of the observed articulation. The 3D renderings illustrate how the microwave (in Pink) can be articulated in 3D space. We also show the predicted rotation axis using a Blue arrow.

We propose to investigate detecting and characterizing the 3D planar articulation of objects from ordinary videos. While seemingly easy for humans, this problem poses many challenges for computers. We propose to approach this problem by combining a top-down detection system that finds planes that can be articulated along with an optimization approach that solves for a 3D plane that can explain a sequence of observed articulations. We show that this system can be trained on a combination of videos and 3D scan datasets. When tested on a dataset of challenging Internet videos and the Charades dataset, our approach obtains strong performance.

Video

Dataset

Internet videos

Category	Links	Details
Video Clips	pos_clips.tar.gz	Articulation video clips. Each clip lasts 3 seconds.
Negative Clips	neg_clips.tar.gz	For each positive video clip, we try to sample a negative clip (no articulation) in the same scene with a hand motion. This is used for the recogition benchmark.
Frames	articulation_frames_v1.tar.gz	Key frames pre-extracted for the dataset. We extract 9 key frames for each video clip, which has 90 frames (fps=30).
Annotations	articulation_annotations_v1.tar.gz	Articulation annotations. Surface normals are only available in the test split. We have preprocessed annotations to COCO format.

ScanNet

Category	Links	Details
Annotations	scannet_annotations.tar.gz	ScanNet plane annotations. It is preprocessed by SparsePlanes.
SURREAL images	scannet_surreal_imgs.tar.gz	We render synthetic humans on around 98k ScanNet images. You can extract it to the ScanNet folder.
SURREAL annotations	scannet_surreal_annotations.tar.gz	The same plane annotations but we change image path to SURREAL images.

Acknowledgements

This work was supported by the DARPA Machine Common Sense Program and Toyota Research Institute. Toyota Research Institute (“TRI”) provided funds to assist the authors with their research but this article solely reflects the opinions and conclusions of its authors and not TRI or any other Toyota entity.