Tong He

News

Apr, 2026: Our paper π3 has been accepted by ICLR 2026 and is selected as the Best paper on China3DV conference.
Sep, 2025: Our paper Aether has been accepted by ICCV 2025 and is selected as Outstanding paper on RIWM workshop.
Mar, 2025: Three papers are accepted by ICCV2025.
Mar, 2025: We released our world model AETHER. Try it here.
Feb, 2025: Two papers have been accepted by CVPR2025
Jan, 2025: Six papers have been accepted by ICLR2025
Oct, 2024: Four papers have been accepted by NIPS2024
Oct, 2024: One paper have been accepted by T-PAMI
Ranked as Worldwide Top 2% Scientists by Stanford University (2024.10)
June, 2024: Five papers have been accepted by ECCV2024
Ranked as Worldwide Top 2% Scientists by Stanford University (2023.10)

Selected Paper on World Model

Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video
Y, Wang, and T. He
arxiv 2026 [PDF] [code] [project]

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer
H. Zhu, H. Liu, Y. Zhao, T. Ye, J. Chen, J. Yu, T. He, S. Han, E. Xie.
arxiv 2026 [PDF] [code] [project]

VInO: A Unified Visual Generator with Interleaved OmniModal Context
J. Chen, T. He, Z. Fu, P. Wan, K. Gai, W. Ye.
arxiv 2025 [PDF] [code] [project]

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Y. Zhou, Y. Wang, J. Zhou, W. Chang, H. Guo, Z. Li, K. Ma, X. Li, Y. Wang, H. Zhu, M. Liu, D. Liu, J. Yang, Z. Fu, J. Chen, C. Shen, J. Pang, K. Zhang and T. He*.
ICLR 2026 [PDF] [code] [project]

DeepVerse: 4D Autoregressive Video Generation as a World Model
J. Chen, H. Zhu, X. He, Y. Wang, J. Zhou, W. Chang, Y. Zhou, Z Li, Z. Fu, J. Pang and T. He*.
arxiv 2025, [PDF] [code] [project]

Aether: Geometric-Aware Unified World Modeling
H. Zhu*, Y. Wang*, J. Zhou*, W. Chang*, Y. Zhou*, Z. Li*, J. Chen*, C. Shen, J. Pang and T. He**.
ICCV 2025 & [Best Paper] on workshop on ICCV 2025 Reliable and Interactive World Models (RIWM), [PDF] [code] [project]

Sekai: A Video Dataset towards World Exploration
Z. Li, C. Li, ...T. He, J. Pang, Y. Qiao, Y. Jia, K. Zhang.
ICLR 2026, [PDF] [code] [project]

Yume1.5: A Text-Controlled Interactive World Generation Model
X. Mao, Z. Li, C. Li, X. Xu, K. Ying, T. He, J. Pang, Y. Qiao and K. Zhang.
arxiv 2025, [PDF] [code] [project]

Yume: An Interactive World Generation Model
X. Mao, S. Lin, Z. Li, C. Li, W. Peng, T. He, J. Pang, M. Chi, Y. Qiao and K. Zhang.
arxiv 2025, [PDF] [code] [project]

Selected Paper on Embodied AI

VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers
Y. Wang, H. Zhu, M. Liu, J. Yang, H. Fang and T. He*.
ICCV 2025, [PDF] [code] [project]

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
J. Yang, H. Zhu, Y. Wang, G. Wu, T. He, L. Wang.
CVPR 2025, [PDF] [code]

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
H. Zhu, Y. Wang, D. Huang, W. Ye, W. Ouyang and T. He*.
NIPS 2024, [PDF] [code]

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
H. Zhu, H. Yang, Y. Wang, J. Yang, L. Wang and T. He*.
ICLR 2025, [PDF] [code]

Selected Paper on 3D Vision

π3: Scalable Permutation-Equivariant Visual Geometry Learning
Y. Wang, J. Zhou, H. Zhu, W. Chang, Y. Zhou, Z Li, J. Chen, J. Pang, C. Shen and T. He*.
ICLR 2026, [PDF] [code] [project]

NeuRodin: A Two-stage Framework for High-Fidelity Neural Surface Reconstruction
Y. Wang, D. Huang, W. Ye, G. Zhang, W. Ouyang and T. He*.
NIPS 2024, [PDF] [code]

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
Y. Chen, T. He*, D. Huang, W. Ye, S. Chen, J. Tang, Z. Cai, L. Yang, G. Yu, G. Lin and C. Zhang.
ICLR 2025, [PDF] [code]

Point Transformer V3: Simpler, Faster, Stronger
X. Wu, L. Jiang, P. Wang, Z. Liu, X. Liu, Y. Qiao, W. Ouyang, T. He* and H. Zhao*.
CVPR 2024, [PDF] [code]

Professional activities

Journals

Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)

International Journal of Computer Vision (IJCV)

Transaction on Image Processing(TIP)

Pattern Recognition(PR)

IEEE Transactions on Circuits and Systems for Video Technology(TCSVT)

Conferences

CVPR, ICCV, ECCV, NIPS, ICLR, AAAI, etc.

(Google scholar)

News

Selected Paper on World Model

Selected Paper on Embodied AI

Selected Paper on 3D Vision

Professional activities

Collabrations