Yong-Lu Li

Tenure-Track Associate Professor

Email: yonglu_li[at]sjtu[dot]edu[dot]cn

Shanghai Jiao Tong University [上海交通大学]

Shanghai Innovation Institute[上海创智学院]

Team Website: RHOS

[School of Artificial Intelligence (SAI)] [人工智能学院]

[Zhiyuan Honors Program] [致远学院]

[Google Scholar] [Github] [LinkedIn] [ORCID]

[ResearchGate] [dblp] [Semantic Scholar]

About

I'm a tenure-track associate professor at Shanghai Jiao Tong University (SJTU), the PI of the RHOS Lab. I study Physical Reasoning, Embodied AI, and Human Activity Understanding. We are building a reasoning-driven system that enables intelligent agents to perceive, reason, and interact with the physical world. Our open-source projects have garnered over 13,000 stars on GitHub.

I received the ICRA 2025 Best Paper Award (HRI, sole corresponding author), AI100 Young Pioneers (MIT review), Baidu Scholarship, WAIC Yunfan Award (twice), Shanghai Overseas High-Level Talent, Wu Wenjun AI Science and Technology Award for Excellent Doctoral Dissertation, Outstanding Reviewer of NeurIPS’20/21. I serve as the Area Chair for NeurIPS’24/25, a lecturer for the "Computer Vision" course at the ACM Honor Class of SJTU, VALSE EACC, and Deputy Secretary-General of the EAI Committee under the Chinese Association for AI.

I am also a member of the Machine Vision and Intelligence Group (MVIG), working closely with Prof. Cewu Lu. Before joining SJTU, I worked closely with IEEE Fellow Prof. Chi Keung Tang and Yu-Wing Tai at the Hong Kong University of Science and Technology (HKUST) (2021-2022). I received a Ph.D. degree (2017-2021) in Computer Science from Shanghai Jiao Tong University (SJTU), under the supervision of Prof. Cewu Lu. Prior to that, I worked and studied at the Institute of Automation, Chinese Academy of Sciences (CASIA) (2014-2017) under the supervision of Prof. Yiping Yang and A/Prof. Yinghao Cai.

Research interests: Human-Robot-Scene

(S) Embodied AI: how to make agents learn skills from humans and interact with the physical world.

(S-1) Physical Reasoning: how to mine, capture, and embed the logics, causal relations, and laws from physical phenomenons.

(S-2) General Multi-Modal Foundation Models: MLLM, VLA, pure 3D/4D large models.

(S-3) Activity Understanding and Generation: how to learn, ground, and generate complex/ambiguous activity concepts (body motion, body-object/body/scene interaction) and object concepts from multi-modal information (2D-3D-4D).

(E) Human-Robot Interaction: Teaching, Joint Learning, Cooperation, Healthcare, etc.

Recruitment: Actively looking for self-motivated students (master/PhD, 2026 spring & fall), interns/engineers/visitors (CV/ML/ROB/NLP background, always welcome) to join us in Machine Vision and Intelligence Group (MVIG). If you share same/similar interests, feel free to drop me an email with your resume. Click Eng or 中 for more details.

News and Olds

2025.08: Our exUMI will appear at CoRL 2025!

2025.06: Dense Policy will appear at ICCV 2025!

2025.06: Our paper on robot data selection will appear at IROS 2025! See u in Hangzhou~

2025.05: Our paper on human-robot joint learning has won the ICRA 2025 Best Paper Award on Human-Robot Interaction!

2025.04: Our paper on human-robot joint learning has been selected as an ICRA 2025 Best Paper Award Finalist.

2025.03: Recieved AI100 Youth Pioneers (AI100青年先锋, MIT Reviews, DeepTech) from MIT Technology Review China.

2025.02: Our works on 3D HOI reconstruction, motion dynamics, garment generation/reconstruction, and dynamic object segmentation will appear at CVPR 2025!

2025.02: Pleasure to be an area chair of NeurIPS 2025!

2025.01: Our work on efficient robot teleoperation will appear at ICRA 2025.

2025.01: Two works on association ability of LLM and human motion will appear at ICLR 2025.

2024.09: Two works on articulated object image manipulation, humanoid-object interaction will appear at NeurIPS 2024.

2024.07: Five works on 4D human motion, dataset distillation, embodied AI, and visual reasoning will appear at ECCV 2024.

2024.06: Our work Visual-Text Dataset Distillation will appear at ICML 2024.

2024.03: Pleasure to be an area chair of NeurIPS 2024!

2024.02: Our work Pangea and Video Distillation will appear at CVPR 2024.

2023.09: The advanced HAKE reasoning engine based on LLM (Symbol-LLM) will appear at NeurIPS 2023!

2023.07: Our works on ego-centric video understanding and object concept learning will appear at ICCV 2023!

2023.07: The upgrade version of DCR will appear at IJCV!

2023.07: Recieved Yunfan Award: Shining Star (10 Chinese AI experts under age of 35) from WAIC 2023.

2023.03: Recieved Wu Wenjun Artificial Intelligence Science and Technology Award 2022, Excellent Doctoral Dissertation from Chinese Society for Artificial Intelligence.

2023.01: HAKE is accepted by TPAMI!

2022.11: We release the human body part states and interactive object bounding box annotations upon AVA (2.1 & 2.2): [HAKE-AVA], and a CLIP-based human part state & verb recognizer: [CLIP-Activity2Vec].

2022.11: AlphaPose will appear at TPAMI!

2022.10: Honored to be a top reviewer in NeurIPS'22!

2022.09: Joined SJTU as a tenure-track assistant professor.

2022.07: Two papers on longtailed learning, HOI detection are accepted by ECCV'22, arXivs and code are coming soon.

2022.03: Five papers on HOI detection/prediction, trajection prediction, 3D detection/keypoints are accepted by CVPR'22, papers and code are coming soon.

2022.02: We release the human body part state labels based on AVA: HAKE-AVA and HAKE 2.0.

2021.10: Recieved Outstanding Reviewer Award from NeurIPS'21.

2021.10: Learning Single/Multi-Attribute of Object with Symmetry and Group is accepted by TPAMI!.

2021.09: Our work Localization with Sampling-Argmax will appear at NeurIPS'21!

2021.05: Selected as the Chinese AI New Star Top-100 (Machine Learning).

2021.02: Upgraded HAKE-Activity2Vec is released! Images/Videos --> human box + ID + skeleton + part states + action + representation. [Demo] [Description]

2021.01: TIN (Transferable Interactiveness Network) is accepted by TPAMI!

2021.01: Recieved Baidu Scholarship (10 recipients globally).

2020.09: Our work HOI Analysis will appear at NeurIPS 2020.

2020.07: Fortunate to recieve WAIC YunFan Award and be among the 2nd A-Class Project.

2020.06: The larger HAKE-Large (>120K images with activity and part state labels) is released!

2020.02: Three papers Image-based HAKE: PaSta-Net, 2D-3D Joint HOI Learning, Symmetry-based Attribute-Object Learning are accepted in CVPR'20! Papers and corresponding resources (code, data) will be released soon.

2019.07: Our paper InstaBoost is accepted in ICCV'19.

2019.06: The Part I of our HAKE : HAKE-HICO which contains the image-level part-state annotations is released!

2019.04: Our project HAKE: Human Activity Knowledge Engine begins trial operation!

2019.02: Our paper on Interactiveness is accepted in CVPR'19.

2018.07: Our paper on GAN & Annotation Generation is accepted in ECCV'18.

2018.05: Presentation (Kaibot Team) in TIDY UP MY ROOM CHALLENGE | ICRA'18.

2018.02: Our paper on Object Part States is accepted in CVPR'18.

Publications

equal contribution: *

corresponding author: *

exUMI: Extensible System for Robot Teaching with Precise Proprioception and Multi-Modalities

Yue Xu, Litao Wei, Pengyu An, Qingyu Zhang, Yong-Lu Li*.

CoRL 2025 [arXiv] [PDF] [Project] [Code]

Dense Policy: Bidirectional Autoregressive Learning of Actions

Yue Su, Xinyu Zhan, Hongjie Fang, Han Xue, Hao-Shu Fang, Yong-Lu Li, Cewu Lu, Lixin Yang.

ICCV 2025 [arXiv] [PDF] [Project] [Code]

SIME: Enhancing Policy Self-Improvement with Modal-level Exploration

Yang Jin, Jun Lv, Wenye Yu, Hongjie Fang, Yong-Lu Li, Cewu Lu.

IROS 2025 [arXiv] [PDF] [Project] [Code]

Motion Before Action: Diffusing Object Motion as Manipulation Condition

Yue Su, Xinyu Zhan, Hongjie Fang, Yong-Lu Li, Cewu Lu, Lixin Yang.

RA-L 2025 [arXiv] [PDF] [Project] [Code]

Reconstructing In-the-Wild Open-Vocabulary Human-Object Interactions

Boran Wen, Dingbang Huang, Zichen Zhang, Jiahong Zhou, Jianbin Deng, Jingyu Gong, Yulong Chen*, Lizhuang Ma*, Yong-Lu Li*.

CVPR 2025 [arXiv] [PDF] [Project] [Code]

GaPT-DAR: Category-level Garments Pose Tracking via Integrated 2D Deformation and 3D Reconstruction

Li Zhang, Mingliang Xu, Jianan Wang, Qiaojun Yu, Lixin Yang, Yong-Lu Li, Cewu Lu, RujingWang, Liu Liu.

CVPR 2025 [arXiv] [PDF] [Project] [Code]

Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis

Feng Zhou, Ruiyang Liu, Chen Liu, Gaofeng He, Yong-Lu Li, Xiaogang Jin, Huamin Wang.

CVPR 2025 [arXiv] [PDF] [Project] [Code]

M3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation

Zixuan Chen*, Jiaxin Li*, Liming Tan, Yejie Guo, Junxuan Liang, Cewu Lu, Yong-Lu Li*.

CVPR 2025 [arXiv] [PDF] [Project] [Code] [Annotation Tool]

Homogeneous Dynamics Space for Heterogeneous Humans

Xinpeng Liu, Junxuan Liang, Chenshuo Zhang, Zixuan Cai, Cewu Lu*, Yong-Lu Li*.

CVPR 2025 [arXiv] [PDF] [Project] [Code]

Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition

Shengcheng Luo*, Quanquan Peng*, Jun Lv*, Kaiwen Hong, Katherine Rose Driggs-Campbell, Cewu Lu, Yong-Lu Li*.

ICRA 2025 Best Paper Award on Human-Robot Interaction [PDF] [Project]

The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs

Hong Li, Nanxi Li, Yuanjie Chen, Jianbin Zhu, Qinlu Guo, Cewu Lu, Yong-Lu Li*.

ICLR 2025 [arXiv] [PDF] [Project] [Code]

ImDy: Human Inverse Dynamics from Imitated Observations

Xinpeng Liu, Junxuan Liang, Zili Lin, Haowen Hou, Yong-Lu Li*, Cewu Lu*.

ICLR 2025 [arXiv] [PDF] [Project] [Code]

Interacted Object Grounding in Spatio-Temporal Human-Object Interactions

Xiaoyang Liu*, Boran Wen*, Xinpeng Liu*, Zizheng Zhou, Hongwei Fan, Cewu Lu, Lizhuang Ma, Yulong Chen*, Yong-Lu Li*.

AAAI 2025 [arXiv] [PDF] [Project] [Code]

Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models

Zehao Wang, Xinpeng Liu, Xiaoqian Wu, Yudonglin Zhang, Zhou Fang, Yifan Fang, Junfu Pu, Cewu Lu*, Yong-Lu Li*.

arXiv 2024 [arXiv] [PDF] [Project] [Code]

General Articulated Objects Manipulation in Real Images via Part-Aware Diffusion Process

Zhou Fang, Yong-Lu Li*, Lixin Yang. Cewu Lu*.

NeurIPS 2024 [arXiv] [PDF] [Project]

HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid

Xinyu Xu, Yizheng Zhang, Yong-Lu Li, Lei Han, Cewu Lu.

NeurIPS 2024 [arXiv] [PDF] [Code]

Take A Step Back: Rethinking the Two Stages in Visual Reasoning

Mingyu Zhang, Jiting Cai, Mingyu Liu, Yue Xu, Cewu Lu, Yong-Lu Li*.

ECCV 2024 [arXiv] [PDF] [Project] [Code]

Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection

Yue Xu, Yong-Lu Li*, Kaitong Cui, Ziyu Wang, Cewu Lu, Yu-Wing Tai, Chi Keung Tang.

ECCV 2024 [arXiv] [PDF] [Project] [Code]

Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases

Xinpeng Liu, Yong-Lu Li*, Ailing Zeng, Zizheng Zhou, Yang You, Cewu Lu*.

ECCV 2024 [arXiv] [PDF] [Project] [Code]

Revisit Human-Scene Interaction via Space Occupancy

Xinpeng Liu*, Haowen Hou*, Yanchao Yang, Yong-Lu Li*, Cewu Lu.

ECCV 2024 [arXiv] [PDF] [Project] [Code]

DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control

Xinyu Xu, Shengcheng Luo, Yanchao Yang, Yong-Lu Li*, Cewu Lu*.

ECCV 2024 [arXiv] [PDF] [Code]

Low-Rank Similarity Mining for Multimodal Dataset Distillation

Yue Xu, Zhilin Lin, Yusong Qiu, Cewu Lu, Yong-Lu Li*.

ICML 2024 [arXiv] [PDF] [Project] [Code]

From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding

Yong-Lu Li*, Xiaoqian Wu*, Xinpeng Liu, Yiming Dou, Yikun Ji, Junyi Zhang, Yixing Li, Xudong Lu, Jingru Tan, Cewu Lu.

CVPR 2024 Highlight [arXiv] [PDF] [Project] [Code]

Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement

Ziyu Wang*, Yue Xu*, Cewu Lu, Yong-Lu Li*.

CVPR 2024 [arXiv] [PDF] [Project] [Code]

Primitive-based 3D Human-Object Interaction Modelling and Programming

Siqi Liu, Yong-Lu Li*, Zhou Fang, Xinpeng Liu, Yang You, Cewu Lu*.

AAAI 2024 [arXiv] [PDF] [Project] [Code]

Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning

Xiaoqian Wu, Yong-Lu Li*, Jianhua Sun, Cewu Lu*.

NeurIPS 2023 [arXiv] [PDF] [Project] [Code]

Beyond Object Recognition: A New Benchmark towards Object Concept Learning

Yong-Lu Li, Yue Xu, Xinyu Xu, Xiaohan Mao, Yuan Yao, Siqi Liu, Cewu Lu.

ICCV 2023 [arXiv] [PDF] [Project] [Data] [Code]

EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding

Yue Xu, Yong-Lu Li*, Zhemin Huang, Michael Xu LIU, Cewu Lu, Yu-Wing Tai, Chi Keung Tang.

ICCV 2023 [arXiv] [PDF] [Project] [Code]

Dynamic Context Removal: A General Training Strategy for Robust Models on Video Action Predictive Tasks

Xinyu Xu, Yong-Lu Li*, Cewu Lu*.

IJCV 2023 [arXiv] [PDF] [Code]

Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions

Yong-Lu Li*, Hongwei Fan*, Zuoyu Qiu, Yiming Dou, Liang Xu, Hao-Shu Fang, Peiyang Guo, Haisheng Su, Dongliang Wang, Wei Wu, Cewu Lu.

Tech Report A part of the HAKE Project [arXiv] [PDF] [Code & Data]

AlphaTracker: A Multi-Animal Tracking and Behavioral Analysis Tool

Ruihan Zhang, Hao-Shu Fang, Zexin Chen, Yu E Zhang, Aneesh Bal, Haowen Zhou, Rachel R Rock, Nancy Padilla-Coreano, Laurel R Keyes, Haoyi Zhu, Yong-Lu Li, Takaki Komiyama, Kay M Tye, Cewu Lu.

Frontiers in Behavioral Neuroscience - Individual and Social Behaviors 2023 [arXiv] [PDF] [Project]

HAKE: A Knowledge Engine Foundation for Human Activity Understanding

Yong-Lu Li, Xinpeng Liu, Xiaoqian Wu, Yizhuo Li, Zuoyu Qiu, Liang Xu, Yue Xu, Hao-Shu Fang, Cewu Lu.

TPAMI 2023 HAKE 2.0 [arXiv] [PDF] [Project] [Code] [Press]

AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time

Hao-Shu Fang*, Jiefeng Li*, Hongyang Tang, Chao Xu, Haoyi Zhu, Yuliang Xiu, Yong-Lu Li, Cewu Lu.

TPAMI 2023 [arXiv] [PDF] [Code]

Constructing Balance from Imbalance for Long-tailed Image Recognition

Yue Xu*, Yong-Lu Li*, Jiefeng Li, Cewu Lu.

ECCV 2022 [arXiv] [PDF] [Code]

Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection

Xiaoqian Wu*, Yong-Lu Li*, Xinpeng Liu, Junyi Zhang, Yuzhe Wu, Cewu Lu.

ECCV 2022 [arXiv] [PDF] [Code]

Interactiveness Field of Human-Object Interactions

Xinpeng Liu*, Yong-Lu Li*, Xiaoqian Wu, Yu-Wing Tai, Cewu Lu, Chi Keung Tang.

CVPR 2022 [arXiv] [PDF] [Code]

Human Trajectory Prediction with Momentary Observation

Jianhua Sun, Yuxuan Li, Liang Chai, Hao-Shu Fang, Yong-Lu Li, Cewu Lu.

CVPR 2022 [PDF]

Learn to Anticipate Future with Dynamic Context Removal

Xinyu Xu, Yong-Lu Li, Cewu Lu.

CVPR 2022 [arXiv] [PDF] [Code]

Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes

Yang You, Zelin Ye, Yujing Lou, Chengkun Li, Yong-Lu Li, Lizhuang Ma, Weiming Wang, Cewu Lu.

CVPR 2022 [arXiv] [PDF] [Code]

UKPGAN: Unsupervised KeyPoint GANeration

Yang You, Wenhai Liu, Yong-Lu Li, Weiming Wang, Cewu Lu.

CVPR 2022 [arXiv] [PDF] [Code]

Highlighting Object Category Immunity for the Generalization of Human-Object Interaction Detection

Xinpeng Liu*, Yong-Lu Li*, Cewu Lu.

AAAI 2022 [arXiv] [PDF] [Code]

Learning Single/Multi-Attribute of Object with Symmetry and Group

Yong-Lu Li, Yue Xu, Xinyu Xu, Xiaohan Mao, Cewu Lu.

TPAMI 2022 [arXiv] [PDF] [Code]

An extension of our CVPR 2020 work (Symmetry and Group in Attribute-Object Compositions, SymNet).

Transferable Interactiveness Knowledge for Human-Object Interaction Detection

Yong-Lu Li, Xinpeng Liu, Xiaoqian Wu, Xijie Huang, Liang Xu, Cewu Lu.

TPAMI 2022 [arXiv] [PDF] [Code]

An extension of our CVPR 2019 work (Transferable Interactiveness Network, TIN).

Localization with Sampling-Argmax

Jiefeng Li, Tong Chen, Ruiqi Shi, Yujing Lou, Yong-Lu Li, Cewu Lu.

NeurIPS 2021 [arXiv] [PDF] [Code]

DecAug: Augmenting HOI Detection via Decomposition

Yichen Xie, Hao-Shu Fang, Dian Shao, Yong-Lu Li, Cewu Lu.

AAAI 2021 [arXiv] [PDF]

HOI Analysis: Integrating and Decomposing Human-Object Interaction

Yong-Lu Li*, Xinpeng Liu*, Xiaoqian Wu, Yizhuo Li, Cewu Lu.

NeurIPS 2020 [arXiv] [PDF] [Code] [Project: HAKE-Action-Torch]

PaStaNet: Toward Human Activity Knowledge Engine

Yong-Lu Li, Liang Xu, Xinpeng Liu, Xijie Huang, Yue Xu, Shiyi Wang, Hao-Shu Fang, Ze Ma, Mingyang Chen, Cewu Lu.

CVPR 2020 [arXiv] [PDF] [Video] [Slides] [Data] [Code]

Oral Talk, Compositionality in Computer Vision in CVPR 2020.

Detailed 2D-3D Joint Representation for Human-Object Interaction

Yong-Lu Li, Xinpeng Liu, Han Lu, Shiyi Wang, Junqi Liu, Jiefeng Li, Cewu Lu.

CVPR 2020 [arXiv] [PDF] [Video] [Slides] [Benchmark: Ambiguous-HOI] [Code]

Symmetry and Group in Attribute-Object Compositions

Yong-Lu Li, Yue Xu, Xiaohan Mao, Cewu Lu.

CVPR 2020 [arXiv] [PDF] [Video] [Slides] [Code]

HAKE: Human Activity Knowledge Engine

Yong-Lu Li, Liang Xu, Xinpeng Liu, Xijie Huang, Yue Xu, Mingyang Chen, Ze Ma, Shiyi Wang, Hao-Shu Fang, Cewu Lu.

Tech Report HAKE 1.0 [arXiv] [PDF] [Project] [Code]

Main Repo:

Sub-repos: Torch TF HAKE-AVA Halpe List

InstaBoost: Boosting Instance Segmentation Via Probability Map Guided Copy-Pasting

Hao-Shu Fang*, Jianhua Sun*, Runzhong Wang*, Minghao Gou, Yong-Lu Li, Cewu Lu.

ICCV 2019 [arXiv] [PDF] [Code]

Transferable Interactiveness Knowledge for Human-Object Interaction Detection

Yong-Lu Li, Siyuan Zhou, Xijie Huang, Liang Xu, Ze Ma, Hao-Shu Fang, Yan-Feng Wang, Cewu Lu.

CVPR 2019 [arXiv] [PDF] [Code]

SRDA: Generating Instance Segmentation Annotation via Scanning, Reasoning and Domain Adaptation

Wenqiang Xu*, Yong-Lu Li*, Cewu Lu.

ECCV 2018 [arXiv] [PDF] [Dataset](Instance-60k & 3D Object Models) [Code]

Beyond Holistic Object Recognition: Enriching Image Understanding with Part States

Cewu Lu, Hao Su, Yong-Lu Li, Yongyi Lu, Li Yi, Chi-Keung Tang, Leonidas J. Guibas.

CVPR 2018 [PDF]

Optimization of Radial Distortion Self-Calibration for Structure from Motion from Uncalibrated UAV Images

Yong-Lu Li, Yinghao Cai, Dayong Wen, Yiping Yang.

ICPR 2016 [PDF]

Public Services

Area Chair, NeurIPS'24/25.

Contest Chair, PRCV'25.

Organizing Chair, CEAI'25.

Publicity Chair, CEAI'24.

Executive Area Chairs, Vision And Learning SEminar, VALSE, 2024-Now.

Executive Member, Committee on CAD&CG, China Computer Federation (CCF), 2024-Now.

Executive Member and member of secretariat, Committee on Embodied Intelligence, China Association for Artificial Intelligence (CAAI), 2024-Now.

Organizer and host, Embodied Intelligence Forum, The third Conference on Machine Learning Algorithms and Natural Language Processing (MLNLP 2024).

Organizer, CAAI Embodied Intelligence Young Scholars Seminar, 5th.

Program Committee Member, Compositionality in Computer Vision, CVPR 2020.

Reviewer

Conference: CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, CoRL, IROS, Siggraph Asia, AAAI, MM.

Journal: TPAMI, IJCV, TVCG, ACM Computing Surveys, TCSVT, Neurocomputing, Pattern Recognition, JVCI, IMAVIS, IoT, Science China Information Sciences, Automation in Construction, IET Image Processing, Neural Networks, Computer Science Review.

International Program Committee (IPC): CAD/Graphics 2025

Competition Judge

The Yangtze River Delta Youth Artificial Intelligence Olympic Challenge.
The 4th International Artificial Intelligence Fair, IAIF.

The 2nd Yangtze River Delta Youth Artificial Intelligence Olympic Challenge.

Teaching

Teacher

AI3604: Computer Vision, Shanghai Jiao Tong University, 23-24, 24-25 (Fall)

ACM Class

CS7352: Advanced Neural Network Theory and Applications, 23-24. 24-25 (Spring)

Computer Science and Technology

AI3618: Virtual Reality, Shanghai Jiao Tong University, 22-23, 23-24. 24-25 (Spring) [Website]

Artificial Intelligence Class

Lecturer

CS348: Computer Vision, Shanghai Jiao Tong University, 2022-2023 (Fall)

ACM Class

Guided Porjects --> Papers:

[1] CymNet: CLIP boosted SymNet for Compositional Zero-shot Learning.

Jiaming Shan, Zhaozi Wang, Mingshu Zhai. ICIPMC 2023, [Porject]

[2] DiffAnnot: Improved Neural Annotator with Denoising Diffusion Model.

Chanfan Lin, Tianyuan Qiu, Hanchong Yan, Muzi Tao. ICIPMC 2023, [Porject]

[3] CPaStaNet: A CLIP-based Human Activity Knowledge Engine.

Yijia Hong, Haotian Luo, Xialin He, Yaoqi Ye. ICIPMC 2023, [Porject]

Assistant

AI1602: Problem Solving and Practice of Artificial Intelligence, Shanghai Jiao Tong University, 2020-2021 (Spring)

2nd Artificial Intelligence Class

Mentor

AI1602: Problem Solving and Practice of Artificial Intelligence, Shanghai Jiao Tong University, 2019-2020 (Spring)

1st Artificial Intelligence Class

Guided Porject --> Paper:

Rb-PaStaNet: A Few-Shot Human-Object Interaction Detection Based on Rules and Part States

Shenyu Zhang, Zichen Zhu, Qingquan Bao (freshmen)

IMVIP 2020, Press Coverage: SEIEE of SJTU

Counseling

具身智能：人型机器人与大模型共同进化，为外脑提供“躯体”, 腾讯研究院大模型十大趋势报告, 2024.

ChatGPT与加强国际话语治理的多维路径, 中国社会科学（内部文稿）, 2023, 8(4).

Talks

2025.5.29: 机器人推理近期进展. 2025张江具身智能开发者大会暨国际人形机器人技能大赛

2025.1.16: 具身视角下的数据与物理推理. 大模型交叉与同和嘉年华

2024.10.10: 具身智能与视觉推理. Chinagraph 2024第十五届中国计算机图形学大会

2024.9.23: 具身智能与视觉推理. CAA云讲座

2024.9.12: 浅谈具身智能Scaling Law. 嘉程创业流水席第232期

2024.4.2: 视觉推理与具身智能. 智猩猩

2023.12.20: RHOS: Robot, Human, Object, and Scene. CS3317 Artificial Intelligence (B), SJTU. Thank Prof. Panpan Cai for hosting.

2023.10.12: RHOS: Robot, Human, Object, and Scene. HKUST AI Seminar series, HKUST (Guangzhou). Thank Prof. Junwei Liang for hosting.

2023.2.24: RHOS: Robot, Human, Object, and Scene. Forum on Trends in cutting-edge Technology for YunFan Award AI Scholars, GAIDC, Global AI Developer Conference

2022.12.7: Knowledge-driven Action Reasoning. HarmonyOS Technology Innovation Salon

2022.8.17: Three Stages in Human-Object Interaction Detection. VALSE Webinar

2022.6.1: Recent progress in Human Activity Knowledge Engine

IDEA. Thank Dr. Ailing Zeng for hosting.

2021.12.9: Knowledge-driven Activity Understanding

CUMT "Image Analysis and Understanding" Frontier Forum. Thank Prof. Zhiwen Shao for hosting.

2021.8.21: HAKE and Human-Object Interaction (HOI) Detection

CoLab. Thank Prof. Si Liu for hosting.

2021.3.05: Human Activity Knowledge Engine (updated)

SJTU Computer Science Global Lunch Series. [Video]

2020.8.23: Knowledge Driven Human Activity Understanding

The 3rd International conference on Image, Video Processing and Artificial Intelligence, IVPAI 2020.

2020.7.12: Human Activity Knowledge Engine

Student Forum on Frontiers of AI, SFFAI. [Slides]

2020.6.16: PaStaNet: Toward Human Activity Knowledge Engine

Compositionality in Computer Vision in CVPR 2020 Virtual. [Video] [Slides]

Honors

ICRA 2025 Best Paper Award on Human-Robot Interaction

ICRA 2025 Best Paper Award Finalist

AI100 Youth Pioneers (AI100青年先锋), Deeptech and MIT Technology Review China.

Leading Technology Award at the World Internet Conference 2024, Visual Understanding Common Technologies and Applications for Intelligent Social Governance.