I'm a final-year Ph.D student of Computer Science and Engineering at the University of Michigan, working with David Fouhey. My primary research interest lies in large-scale vision language models, especially for 3D scenes and downstream robotics applications.
I'm currently interning at Nvidia Robotics Lab at Seattle, WA. And I'm actively looking for full-time positions in computer vision. Please feel free to get in touch if there are any opportunities!
- [2023/07] Both "Understanding 3D Object Interaction from a Single Image" and "Sound Localization from Motion" are accepted at ICCV 2023!
- [2023/06] SpotTarget is accepted by 19th International Workshop on Mining and Learning With Graphs.
- [2023/06] I'm a recipient of Rackham Doctoral Intern Fellowship 2023.
- [2023/05] I started my internship at Amazon Web Services at Seattle, WA.
- [2023/05] We have released the demo of "Chat with NeRF: Grounding 3D Objects in Neural Radiance Field through Dialog". Try the demo here!
- [2023/04] "Understanding 3D Object Interaction from a Single Image" will be presented at CVPR 2023 Workshop on 3D Vision and Robotics.
We detect potential 3D object interaction from a single image and a set of query points. Building on Segment-Anything, our model can predict whether the object is movable, rigid, and 3D locations, affordance, articulation, etc.
We jointly learn to localize sound sources from audio and to estimate camera rotations from images. Our method is entirely self-supervised.
We address several common pitfalls in training graph neural networks for link prediction.
We propose ViewSeg, which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoints by segmenting it into semantic categories.
We present Associative3D, which addresses 3D volumetric reconstruction from two views of a scene with an unknown camera, by simultaneously reconstructing objects and figuring out their relationship.
We present Open Annotations of Single Image Surfaces (OASIS), a dataset for single-image 3D in the wild consisting of dense annotations of detailed 3D geometry for Internet images.