Xumin Yu
I am a fifth year Ph.D student in the Department of Automation at Tsinghua University, advised by Prof. Jiwen Lu. In 2020, I obtained my B.Eng. in the Department of Electronic Engineering, Tsinghua University.
I am broadly interested in computer vision and deep learning. My current research focuses on 3D vision and Video analysis.
Email  / 
Google Scholar  / 
Github
|
|
News
2022-09: One paper P2P are accepted to NeurIPS 2022.
2022-03: 3 papers on 3D vision and video understanding are accepted to CVPR 2022.
2021-10: Our solution based on PoinTr won the 1st place in the MVP Completion Challenge (ICCV 2021 Workshop).
2021-07: 2 papers (including 1 oral) on 3D vision and video understanding are accepted to ICCV 2021.
|
Publications
* indicates equal contribution
|
|
P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting
Ziyi Wang*,
Xumin Yu*,
Yongming Rao*,
Jie Zhou ,
Jiwen Lu
Conference on Neural Information Processing Systems (NeurIPS), 2022
Spotlight
[arXiv]
[Code]
[Project Page]
[中文解读]
P2P is a framework to leverage large-scale pre-trained image models for 3D point cloud analysis.
|
|
Point-BERT: Pre-Training 3D Point Cloud Transformers with Masked Point Modeling
Xumin Yu*,
Lulu Tang*,
Yongming Rao*,
Tiejun Huang,
Jie Zhou ,
Jiwen Lu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[arXiv]
[Code]
[Project Page]
[中文解读]
Point-BERT is a new paradigm for learning Transformers in an unsupervised manner by generalizing the concept of BERT onto 3D point cloud data.
|
|
PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers
Xumin Yu*, Yongming Rao *, Ziyi Wang, Zuyan Liu, Jiwen Lu , Jie Zhou
IEEE International Conference on Computer Vision (ICCV), 2021
Oral Presentation
[arXiv] [supp] [Code] [中文解读 (by CVer)]
PoinTr is a transformer-based framework that reformulates point cloud completion as a set-to-set translation problem.
|
|
Group-aware Contrastive Regression for Action Quality Assessment
Xumin Yu*, Yongming Rao*, Wenliang Zhao, Jiwen Lu , Jie Zhou
IEEE International Conference on Computer Vision (ICCV), 2021
[arXiv] [Code]
We propose a new contrastive regression (CoRe) framework to learn the relative scores by pair-wise comparison, which highlights the differences between videos and guides the models to learn the key hints for assessment.
|
|
Graph Interaction Networks for Relation Transfer in Human Activity Videos
Yansong Tang, Yi Wei , Xumin Yu, Jiwen Lu , Jie Zhou
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2020
[Paper]
We propose a graph interaction networks (GINs) model for transferring relation knowledge across two graphs two different scenarios for video
analysis, including a new proposed setting for unsupervised skeleton-based action recognition across different datasets, and supervised group activity recognition with multi-modal inputs.
|
|
Learning fine-grained estimation of physiological states from coarse-grained labels by distribution restoration
Zengyi Qin , Jiansheng Chen , Zhenyu Jiang , Xumin Yu, Chunhua Hu, Yu Ma, Suhua Miao and Rongsong Zhou
Scientific Reports , 2020
[Paper] [Code]
Our method allows machine learning algorithms to perform fine-grained estimation of physiological states (e.g., sleep depth) even if the training labels are coarse-grained.
|
Honors and Awards
Excellent Undergraduate in Tsinghua University, 2020
The First Prize of Microsoft Imagine Cup, China Finals, 2018
|
|