Tong He

Publications

Sekai: A Video Dataset towards World Exploration
Z. Li, C. Li, ...T. He, J. Pang, Y. Qiao, Y. Jia, K. Zhang.
arxiv 2025, [PDF] [code] [project]

DeepVerse: 4D Autoregressive Video Generation as a World Model
J. Chen, H. Zhu, X. He, Y. Wang, J. Zhou, W. Chang, Y. Zhou, Z Li, Z. Fu, J. Pang and T. He*.
arxiv 2025, [PDF] [code] [project]

Aether: Geometric-Aware Unified World Modeling
H. Zhu*, Y. Wang*, J. Zhou*, W. Chang*, Y. Zhou*, Z. Li*, J. Chen*, C. Shen, J. Pang and T. He**.
arxiv 2025, [PDF] [code] [project]

Hulk: A Universal Knowledge Translator for Human-Centric Tasks
Y. Wang, Y. Wu, S. Tang, W. He, X. Guo, F. Zhu, L. Bai, R. Zhao, J. Wu, T. He and W. Ouyang.
TPAMI 2025, [PDF] [code]

Stimulative Training++: Go Beyond The Performance Limits of Residual Networks
P. Ye, T. He, S. Tang, B. Li, T. Chen, L. Bai and W. Ouyang.
TPAMI 2025. [PDF] [code]

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
H. Zhu, H. Yang, X. Wu, D. Huang, S. Zhang, X. He, T. He*, H. Zhao, C. Shen, Y. Qiao and W. Ouyang.
TPAMI 2025, [PDF] [code]

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving
Z. Xing, X. Zhang, Y. Hu, B. Jiang, T. He, Q. Zhang, X. Long and W. Yin.
CVPR 2025, [PDF] [code]

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
J. Yang, H. Zhu, Y. Wang, G. Wu, T. He, L. Wang.
CVPR 2025, [PDF] [code]

Depth Any Video with Scalable Synthetic Data
H. Yang, D. Huang, W. Yin, C. Shen, H. Liu, X. He, B. Lin, W. Ouyang and T. He*.
ICLR 2025, [PDF] [code]

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
H. Zhu, H. Yang, Y. Wang, J. Yang, L. Wang and T. He*.
ICLR 2025, [PDF] [code]

Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
J. Chen, D. Huang, W. Ye, W. Ouyang and T. He*.
ICLR 2025, [PDF] [code]

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
P. Gao, ... and T. He....
ICLR 2025, [PDF] [code]

ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction
Z. Tang, W. Ye, Y. Wang, D. Huang, H. Bao, T. He* and G. Zhang*.
ICLR 2025, [PDF] [code]

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
Y. Chen, T. He*, D. Huang, W. Ye, S. Chen, J. Tang, Z. Cai, L. Yang, G. Yu, G. Lin and C. Zhang.
ICLR 2025, [PDF] [code]

NeuRodin: A Two-stage Framework for High-Fidelity Neural Surface Reconstruction
Y. Wang, D. Huang, W. Ye*, G. Zhang, W. Ouyang and T. He*.
NIPS 2024, [PDF] [code]

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
H. Zhu, Y. Wang, D. Huang, W. Ye, W. Ouyang and T. He*.
NIPS 2024, [PDF] [code]

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
W. Ye, C. Ji, Z. Chen, J. Gao, X. Huang, S. Zhang, W. Ouyang, T. He*, C. Zhao*, G. Zhang*.
NIPS 2024, [PDF] [code]

EMR-MERGING: Tuning-Free High-Performance Model Merging
C. Huang, P. Ye, T. Chen, T. He, X. Yue, W. Ouyang.
NIPS 2024, [PDF] [code]

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection
Y. Lu, X. Ma, L. Yang, T. Zhang, Y. Liu, Q. Chu, T. He*, Y. Li and W. Ouyang.
T-PAMI 2024, [PDF] [code]

GVGEN: Text-to-3D Generation with Volumetric Representation
X. He, J. Chen, S. Peng, D. Huang, Y. Li, X. Huang, C. Yuan, W. Ouyang and T. He*.
ECCV 2024, [PDF] [code]

Agent3D-Zero: An Agent for Zero-shot 3D Understanding
S. Zhang, D. Huang, J. Deng, S. Tang, W. Ouyang, T. He* and Y. Zhang*.
ECCV 2024, [PDF] [code]

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
Y. Wu, Y. Wang, S. Tang, W. Wu, T. He, W. Ouyang, J. Wu and P. Torr.
ECCV 2024, [PDF] [code]

Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
Z. Zhang, W. Hu, Y. Liao, T. He and H. Zhao.
ECCV 2024, [PDF] [code]

PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines
Z. Wang, Z. Lu, D. Huang, T. He, X. Liu, W. Ouyang and L. Bai
ECCV 2024, [PDF] [code]

Boosting Residual Networks with Group Knowledge
S. Tang, P. Ye, B. Li, W. Lin, T. Chen, T. He, C. Yu and W. Ouyang.
AAAI 2024, [PDF] [code]

UniPad: A Universal Pre-Training Paradigm For Autonomous Driving
H. Yang, S. Zhang, D. Huang, X. Wu, H. Zhu, T. He*, S. Tang, H. Zhao, Q. Qiu, B. Lin, X. He and W. Ouyang.
CVPR 2024, [PDF] [code]

TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation
X Wu, Y Hou, X Huang, B Lin, T. He, X Zhu, Y Ma, B Wu, H Liu, D Cai, W Ouyang
CVPR 2024, [PDF] [code]

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
Y. Yang, Y. Huang, X. Wu, Y. Guo, S. Zhang, H. Zhao, T. He and X. Liu.
CVPR 2024, [PDF] [code]

Point Transformer V3: Simpler, Faster, Stronger
X. Wu, L. Jiang, P. Wang, Z. Liu, X. Liu, Y. Qiao, W. Ouyang, T. He* and H. Zhao*.
CVPR 2024, [PDF] [code]

Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers
P. Ye, Y. Huang, C. Tu, M. Li, T. Chen, T. He and W. Ouyang.
arxiv 2023, [PDF] [code]

Frozen CLIP Model is An Efficient Point Cloud Backbone
X. Huang, S. Li, W. Qu, T. He*, Y. Zuo and W. Ouyang.
AAAI 2024, [PDF] [code]

Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Y. Huang, P. Ye, X. Huang, S. Li, T. Chen, T. He and W. Ouyang.
arxiv 2023, [PDF] [code]

The Equalization Losses: Gradient-Driven Training for Long-tailed Object Recognition
J. Tan, B. Li, X. Lu, Y. Yao, F. Yu, T. He, W. Ouyang.
T-PAMI 2023 [PDF] [code]

Ponder: Point Cloud Pre-training via Neural Rendering
D. Huang, S. Peng, T. He*, X. Zhou and W. Ouyang.
ICCV 2023, [PDF] [code]

SAM3D: Segment Anything in 3D Scenes
Y. Yang, X. Wu, T. He, H. Zhao and X. Liu.
[PDF] [code]

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer
H. Yang, W. Wang, M. Chen, B. Lin*, T. He*, H. Chen, X. He and W. Ouyang.
CVPR 2023, [PDF] [code]

CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm
M. Xu, Y. Wang, Y. Liu, T. He, and Y. Qiao.
T-PAMI 2023, [PDF] [code]

GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds
H. Yang, T. He, J. Liu, H. Chen, B. Wu, B. Lin, X. He and W. Ouyang.
CVPR 2023, [PDF] [code]

MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency
M. Xu, M. Xu, T. He, W. Ouyang, Y. Wang, X. Han, and Y. Qiao.
CVPR 2023, [PDF]

Crossing the Gap: Domain Generalization for Image Captioning
Y. Ren, Z. Mao, S. Fang, Y. Lu, T. He, H. DU, Y. Zhang, and W. Ouyang.
CVPR 2023, [PDF]

β-DARTS++: Bi-level Regularization for Proxy-robust Differentiable Architecture Search
P. Ye, T. He, B. Li, T. Chen, L. Bai and W. Ouyang.
arxiv 2023, [PDF] [code]

OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection
C. Huang, T. He, H. Ren, W. Wang, B. Lin, and D. Cai.
TIP 2022, [PDF]

3D-QueryIS: A Query-based Framework for 3D Instance Segmentation
J. Liu, T. He, H. Yang, R. Su, J. Tian, J. Wu, H. Guo, K. Xu and W. Ouyang.
arxiv 2022, [PDF] [code]

Reconstructing Hand-Held Objects from Monocular Video
D. Huang, X. Ji, X. He, J. Sun, T. He, Q. Shuai, W. Ouyang, and X. Zhou.
SIGGRAPH Asia 2022, [PDF] [Project Page] [code]

Dynamic Convolution for 3D Point Cloud Instance Segmentation
T. He, C. Shen and A. Hengel
T-PAMI, 2022 [PDF] [code]

PointInst3D: Segmenting 3D Instances by Points
T. He, W. Yin, C. Shen, A. Hengel
ECCV, 2022 [PDF] [code]

ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting
Y. Liu, C. Shen, L. Jin, T. He, P. Chen, C. Liu and H. Chen
T-PAMI, 2021 [PDF] [code]

Exploring the Capacity of Sequential-free Box Discretization Network for Omnidirectional Scene Text Detection
Y. Liu, T. He, H. Chen, X. Wang, C. Luo, S. Zhang, C. Shen and L. Jin
IJCV, 2021 [PDF] [code]

HCRF-Flow: Scene Flow from Point Clouds with Continuous High-order CRFs and Position-aware Flow Embedding
R. Li, G. Lin, T. He, F. Liu and C. Shen
CVPR, 2021 [PDF]

DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution
T. He, C. Shen, and A. Hengel
CVPR, 2021 [PDF] [Code]

Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation
T. He, D. Gong, Z. Tian and C. Shen
ECCV, 2020 [PDF]

Instance-Aware Embedding for Point Cloud Instance Segmentation
T. He, Y. Liu, C. Shen, X. Wang and C.Sun
ECCV, 2020 [PDF]

FCOS: A Simple and Strong Anchor-free Object Detector
Z. Tian, C. Shen, H. Chen, T. He
T-PAMI, 2020. [PDF] [Code]

ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
Y. Liu, H. Chen, C. Shen, T. He, L. Jin, L. Wang
CVPR 2020 [PDF] [Code]

FCOS: Fully Convolutional One-Stage Object Detection
Z. Tian, C. Shen, H. Chen, T. He
ICCV, 2019 [PDF] [Code]

Knowledge Translation and Adaptation for Efficient Semantic Segmentation
T. He, C. Shen, Z. Tian, D. Gong, C. Sun, Y. Yan
CVPR, 2019 [PDF] [Results On Cityscapes Test Set]

Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation
Z. Tian, T. He, C. Shen, Y. You
CVPR, 2019 [PDF]

An End-to-End Text Detector with Regional Attention

An End-to-End TextSpotter with Explicit Alignment and Attention
T. He, Z. Tian, W. Huang, C. Shen, Y. Qiao, C. Sun
CVPR, 2018 [PDF] [code]

Single Shot Text Detector with Regional Attention
P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, X. Li
ICCV, 2017 [PDF] [code]

Orientation-Aware Text Proposals Network for Scene Text Detection
H. Huang, Z. Tian, T. He, W. Huang, Y. Qiao
CCBR, 2017

Detecting Text in Natural Image with Connectionist Text Proposal Network
T. Zhi, W. Huang, T. He, P. He, Y. Qiao
ECCV, 2016 [demo] [code]

Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network
T. He, W. Huang, Y. Qiao and J.Yao
arxiv [arxiv 1510.03283]

Text-Attentional Convolutional Neural Networks for Scene Text Detection
T. He, W. Huang, Y. Qiao and J.Yao
T-IP 2016 [arxiv 1510.03283]

An efficient method for text detection from indoor panorama images using extremal regions
Y. Liu, K. Zhang, J. Yao, T. He, Y. Liu and J. Tu
ICIA, 2015.

Accurate Multi-Scale License Plate Localization Via Image Saliency
T. He, J. Yao, K. Zhang, Y. Hou and S. Han
ITSC, 2014. [oral]

(Google scholar)

News

Publications

Professional activities