I'm a tenure-track assistant professor at Shanghai Jiao Tong University (SJTU), affiliated with the Qing Yuan Research Institute. I am a member of the Machine Vision and Intelligence Group (MVIG), working closely with Prof. Cewu Lu. I study Human Activity Understanding, Visual Reasoning, and Embodied AI. We are building HAKE, a knowledge-driven system that enables intelligent agents to perceive human activities, reason human behavior logics, learn skills from human activities, and interact with environment. Check out the HAKE site for more information.
Before joining SJTU, I worked closely with IEEE Fellow Prof. Chi Keung Tang and Yu-Wing Tai at the Hong Kong University of Science and Technology (HKUST) (2021-2022). I received a Ph.D. degree (2017-2021) in Computer Science from Shanghai Jiao Tong University (SJTU), under the supervision of Prof. Cewu Lu. Prior to that, I worked and studied at the Institute of Automation, Chinese Academy of Sciences (CASIA) (2014-2017) under the supervision of Prof. Yiping Yang and A/Prof. Yinghao Cai.
Research interests: Human-Robot-Scene
(S) Embodied AI: how to make agents learn skills from humans and interact with humans.
(S-1) Human Activity Understanding: how to learn and ground complex/ambiguous human activity concepts (body motion, human-object/human/scene interaction) and object concepts from multi-modal information (2D-3D-4D).
(S-2) Visual Reasoning: how to mine, capture, and embed the logics and causal relations from human activities.
(S-3) General Multi-Modal Foundation Models: especially for human-centric perception tasks.
(S-4) Activity Understanding from A Cognitive Perspective: work with multidisciplinary researchers to study how the brain perceives activities.
(E) Human-Robot Interaction for Smart Hospital: work with the healthcare team (doctors and engineers) in SJTU and Ruijin Hospital to develop intelligent robots to help people.
Recruitment: Actively looking for self-motivated students (master/
PhD, 2023 fall), interns/engineers/visitors (CV/ML/ROB/NLP background, always welcome) to join us in Machine Vision and Intelligence Group (MVIG). If you share same/similar interests, feel free to drop me an email with your resume. Click here for more details.
Tech Report A part of the HAKE Project [arXiv] [PDF] [Code & Data]
Tech Report HAKE 1.0 [arXiv] [PDF] [Project] [Code]
Sub-repos: Torch TF HAKE-AVA Halpe List
CVPR 2022 [PDF]
TPAMI 2022 [arXiv] [PDF] [Code]
An extension of our CVPR 2019 work (Transferable Interactiveness Network, TIN).
NeurIPS 2020 [arXiv] [PDF] [Code] [Project: HAKE-Action-Torch]
CVPR 2020 [arXiv] [PDF] [Video] [Slides] [Data] [Code]
Oral Talk, Compositionality in Computer Vision in CVPR 2020.
CVPR 2020 [arXiv] [PDF] [Video] [Slides] [Benchmark: Ambiguous-HOI] [Code]
ECCV 2018 [arXiv] [PDF] [Dataset](Instance-60k & 3D Object Models) [Code]
CVPR 2018 [PDF]
ICPR 2016 [PDF]
1) HAKE-Image (CVPR'18/20): Human body part state (PaSta) labels in images. HAKE-HICO, HAKE-HICO-DET, HAKE-Large, Extra-40-verbs.
2) HAKE-AVA: Human body part state (PaSta) labels in videos from AVA dataset. HAKE-AVA.
3) HAKE-Action-TF, HAKE-Action-Torch (CVPR'18/19/22, NeurIPS'20, TPAMI'22/23, ECCV'22, AAAI'22): SOTA action understanding methods and the corresponding HAKE-enhanced versions (TIN, IDN, IF, ParMap).
4) HAKE-3D (CVPR'20): 3D human-object representation for action understanding (DJ-RN).
5) HAKE-Object (CVPR'20, TPAMI'21): object knowledge learner to advance action understanding (SymNet).
6) HAKE-A2V (CVPR'20): Activity2Vec, a general activity feature extractor based on HAKE data, converts a human (box) to a fixed-size vector, PaSta and action scores.
7) Halpe: a joint project under Alphapose and HAKE, full-body human keypoints (body, face, hand, 136 points) of 50,000 HOI images.
8) HOI Learning List: a list of recent HOI (Human-Object Interaction) papers, code, datasets and leaderboard on widely-used benchmarks.