Tong He
Email : tonghe90[at]gmail[dot]com
I am now a Research Fellow at Shanghai AI Lab, working with Prof. Ouyang Wanli and Prof. Qiao Yu . I was a Research Fellow at Australian Institute for Machine Learning (AIML), the University of Adelaide, working with Prof. Chunhua Shen and Prof. Anton van den Hengel

(Google scholar)

I got my PhD in computer science at the University of Adelaide and supervised by Chunhua Shen. I was a visiting student at MMLAB of the Chinese University of Hong Kong at Shenzhen under the supervision of Dr.Weilin Huang and Prof.Yu Qiao. We are looking for self-motivated PhD students (joint PhD program with SJTU, FDU, ZJU, USTC etc) and interns. If you are interested in joining us, please feel free to contact me with your CV!


  • Mar, 2024: Four papers have been accepted by CVPR2024
  • July, 2023: One paper on long-tail object recognition has been accepted by T-PAMI
  • July, 2023: One paper on point cloud pretraining (Ponder) has been accepted by ICCV2023
  • Mar, 2023: One paper on point cloud pretraining (CP3) has been accepted by T-PAMI
  • Mar, 2023: Four papers have been accepted by CVPR23
  • Oct, 2022: One paper has been accepted by SIGGRAPH ASIA.
  • Oct, 2022: The extended version of DyCo3D has been accepted by T-PAMI
  • July, 2022: One paper has been accepted by ECCV22
  • April, 2022: Check our latest instance segmentation paper for 3D point cloud.
  • March, 2021: One T-PAMI has been accepted.
  • March, 2021: One IJCV has been accepted.
  • March, 2021: Two CVPR papers have been accepted.
  • Nov, 2020: Got Ph.D degree and my thesis was awarded the Dean’s Commendation for Doctoral Thesis Excellence.
  • Oct, 2020: The extended version of FCOS is accepted by T-PAMI.
  • July, 2020: Two ECCV papers have been accepted.
  • March, 2020: One CVPR paper has been accepted.


GVGEN: Text-to-3D Generation with Volumetric Representation
X. He, J. Chen, S. Peng, D. Huang, Y. Li, X. Huang, C. Yuan, W. Ouyang and T. He*.
arxiv 2024, [PDF] [code]
Agent3D-Zero: An Agent for Zero-shot 3D Understanding
S. Zhang, D. Huang, J. Deng, S. Tang, W. Ouyang, T. He and Y. Zhang.
arxiv 2024, [PDF] [code]
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
Y. Wu, Y. Wang, S. Tang, W. Wu, T. He, W. Ouyang, J. Wu and P. Torr.
arxiv 2024, [PDF] [code]
Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
Z. Zhang, W. Hu, Y. Liao, T. He and H. Zhao.
arxiv 2024, [PDF] [code]
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
H. Zhu, Y. Wang, D. Huang, W. Ye, W. Ouyang and T. He*.
arxiv 2024, [PDF] [code]
UniPad: A Universal Pre-Training Paradigm For Autonomous Driving
H. Yang, S. Zhang, D. Huang, X. Wu, H. Zhu, T. He*, S. Tang, H. Zhao, Q. Qiu, B. Lin, X. He and W. Ouyang.
CVPR 2024, [PDF] [code]
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
Y. Yang, Y. Huang, X. Wu, Y. Guo, S. Zhang, H. Zhao, T. He and X. Liu.
CVPR 2024, [PDF] [code]
Point Transformer V3: Simpler, Faster, Stronger
X. Wu, L. Jiang, P. Wang, Z. Liu, X. Liu, Y. Qiao, W. Ouyang, T. He and H. Zhao.
CVPR 2024, [PDF] [code]
Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers
P. Ye, Y. Huang, C. Tu, M. Li, T. Chen, T. He and W. Ouyang.
arxiv 2023, [PDF] [code]
Frozen CLIP Model is An Efficient Point Cloud Backbone
X. Huang, S. Li, W. Qu, T. He*, Y. Zuo and W. Ouyang.
AAAI 2024, [PDF] [code]
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
Y. Wang, Y. Wu, S. Tang, W. He, X. Guo, F. Zhu, L. Bai, R. Zhao, J. Wu, T. He and W. Ouyang.
arxiv 2023, [PDF] [code]
GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection
Y. Lu, X. Ma, L. Yang, T. Zhang, Y. Liu, Q. Chu, T. He*, Y. Li and W. Ouyang.
arxiv 2023, [PDF] [code]
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
H. Zhu, H. Yang, X. Wu, D. Huang, S. Zhang, X. He, T. He*, H. Zhao, C. Shen, Y. Qiao and W. Ouyang.
arxiv 2023, [PDF] [code]
Boosting Residual Networks with Group Knowledge
S. Tang, P. Ye, B. Li, W. Lin, T. Chen, T. He, C. Yu and W. Ouyang.
arxiv 2023, [PDF] [code]
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Y. Huang, P. Ye, X. Huang, S. Li, T. Chen, T. He and W. Ouyang.
arxiv 2023, [PDF] [code]
The Equalization Losses: Gradient-Driven Training for Long-tailed Object Recognition
J. Tan, B. Li, X. Lu, Y. Yao, F. Yu, T. He, W. Ouyang.
T-PAMI 2023 [PDF] [code]
Ponder: Point Cloud Pre-training via Neural Rendering
D. Huang, S. Peng, T. He*, X. Zhou and W. Ouyang.
ICCV 2023, [PDF] [code]
SAM3D: Segment Anything in 3D Scenes
Y. Yang, X. Wu, T. He, H. Zhao and X. Liu.
[PDF] [code]
PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer
H. Yang, W. Wang, M. Chen, B. Lin*, T. He*, H. Chen, X. He and W. Ouyang.
CVPR 2023, [PDF] [code]
Stimulative Training++: Go Beyond The Performance Limits of Residual Networks
P. Ye, T. He, S. Tang, B. Li, T. Chen, L. Bai and W. Ouyang.
[PDF] [code]
CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm
M. Xu, Y. Wang, Y. Liu, T. He, and Y. Qiao.
T-PAMI 2023, [PDF] [code]
GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds
H. Yang, T. He, J. Liu, H. Chen, B. Wu, B. Lin, X. He and W. Ouyang.
CVPR 2023, [PDF] [code]
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency
M. Xu, M. Xu, T. He, W. Ouyang, Y. Wang, X. Han, and Y. Qiao.
CVPR 2023, [PDF]
Crossing the Gap: Domain Generalization for Image Captioning
Y. Ren, Z. Mao, S. Fang, Y. Lu, T. He, H. DU, Y. Zhang, and W. Ouyang.
CVPR 2023, [PDF]
β-DARTS++: Bi-level Regularization for Proxy-robust Differentiable Architecture Search
P. Ye, T. He, B. Li, T. Chen, L. Bai and W. Ouyang.
arxiv 2023, [PDF] [code]
OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection
C. Huang, T. He, H. Ren, W. Wang, B. Lin, and D. Cai.
arxiv 2022, [PDF]
3D-QueryIS: A Query-based Framework for 3D Instance Segmentation
J. Liu, T. He, H. Yang, R. Su, J. Tian, J. Wu, H. Guo, K. Xu and W. Ouyang.
arxiv 2022, [PDF] [code]
Reconstructing Hand-Held Objects from Monocular Video
D. Huang, X. Ji, X. He, J. Sun, T. He, Q. Shuai, W. Ouyang, and X. Zhou.
SIGGRAPH Asia 2022, [PDF] [Project Page] [code]
Dynamic Convolution for 3D Point Cloud Instance Segmentation
T. He, C. Shen and A. Hengel
T-PAMI, 2022 [PDF] [code]
PointInst3D: Segmenting 3D Instances by Points
T. He, W. Yin, C. Shen, A. Hengel
ECCV, 2022 [PDF] [code]
ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting
Y. Liu, C. Shen, L. Jin, T. He, P. Chen, C. Liu and H. Chen
T-PAMI, 2021 [PDF] [code]
Exploring the Capacity of Sequential-free Box Discretization Network for Omnidirectional Scene Text Detection
Y. Liu, T. He, H. Chen, X. Wang, C. Luo, S. Zhang, C. Shen and L. Jin
IJCV, 2021 [PDF] [code]
HCRF-Flow: Scene Flow from Point Clouds with Continuous High-order CRFs and Position-aware Flow Embedding
R. Li, G. Lin, T. He, F. Liu and C. Shen
CVPR, 2021 [PDF]
DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution
T. He, C. Shen, and A. Hengel
CVPR, 2021 [PDF] [Code]
Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation
T. He, D. Gong, Z. Tian and C. Shen
ECCV, 2020 [PDF]
Instance-Aware Embedding for Point Cloud Instance Segmentation
T. He, Y. Liu, C. Shen, X. Wang and C.Sun
ECCV, 2020 [PDF]
FCOS: A Simple and Strong Anchor-free Object Detector
Z. Tian, C. Shen, H. Chen, T. He
T-PAMI, 2020. [PDF] [Code]
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
Y. Liu, H. Chen, C. Shen, T. He, L. Jin, L. Wang
CVPR 2020 [PDF] [Code]
FCOS: Fully Convolutional One-Stage Object Detection
Z. Tian, C. Shen, H. Chen, T. He
ICCV, 2019 [PDF] [Code]
Knowledge Translation and Adaptation for Efficient Semantic Segmentation
T. He, C. Shen, Z. Tian, D. Gong, C. Sun, Y. Yan
CVPR, 2019 [PDF] [Results On Cityscapes Test Set]
Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation
Z. Tian, T. He, C. Shen, Y. You
CVPR, 2019 [PDF]
An End-to-End TextSpotter with Explicit Alignment and Attention
T. He, Z. Tian, W. Huang, C. Shen, Y. Qiao, C. Sun
CVPR, 2018 [PDF] [code]
Single Shot Text Detector with Regional Attention
P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, X. Li
ICCV, 2017 [PDF] [code]
Orientation-Aware Text Proposals Network for Scene Text Detection
H. Huang, Z. Tian, T. He, W. Huang, Y. Qiao
CCBR, 2017
Detecting Text in Natural Image with Connectionist Text Proposal Network
T. Zhi, W. Huang, T. He, P. He, Y. Qiao
ECCV, 2016 [demo] [code]
Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network
T. He, W. Huang, Y. Qiao and J.Yao
arxiv [arxiv 1510.03283]
Text-Attentional Convolutional Neural Networks for Scene Text Detection
T. He, W. Huang, Y. Qiao and J.Yao
T-IP 2016 [arxiv 1510.03283]
An efficient method for text detection from indoor panorama images using extremal regions
Y. Liu, K. Zhang, J. Yao, T. He, Y. Liu and J. Tu
ICIA, 2015.
Accurate Multi-Scale License Plate Localization Via Image Saliency
T. He, J. Yao, K. Zhang, Y. Hou and S. Han
ITSC, 2014. [oral]

Professional activities


    Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)

    International Journal of Computer Vision (IJCV)

    Transaction on Image Processing(TIP)

    Pattern Recognition(PR)

    IEEE Transactions on Circuits and Systems for Video Technology(TCSVT)



Last Updated on 26th Aug, 2019

Published with GitHub Pages