I'm a tenure-track assistant professor at Shanghai Jiao Tong University (SJTU), affiliated with the Qing Yuan Research Institute. I am a member of the Machine Vision and Intelligence Group (MVIG), working closely with Prof. Cewu Lu. I study Human Activity Understanding, Visual Reasoning, and Embodied AI. We are building HAKE, a knowledge-driven system that enables intelligent agents to perceive human activities, reason human behavior logics, learn skills from human activities, and interact with environment. Check out the HAKE site for more information.
Before joining SJTU, I worked closely with IEEE Fellow Prof. Chi Keung Tang and Yu-Wing Tai at the Hong Kong University of Science and Technology (HKUST) (2021-2022). I received a Ph.D. degree (2017-2021) in Computer Science from Shanghai Jiao Tong University (SJTU), under the supervision of Prof. Cewu Lu. Prior to that, I worked and studied at the Institute of Automation, Chinese Academy of Sciences (CASIA) (2014-2017) under the supervision of Prof. Yiping Yang and A/Prof. Yinghao Cai.
Research interests: Human-Robot-Scene
(S) Embodied AI: how to make agents learn skills from humans and interact with humans.
(S-1) Human Activity Understanding: how to learn and ground complex/ambiguous human activity concepts (body motion, human-object/human/scene interaction) and object concepts from multi-modal information (2D-3D-4D).
(S-2) Visual Reasoning: how to mine, capture, and embed the logics and causal relations from human activities.
(S-3) General Multi-Modal Foundation Models: especially for human-centric perception tasks.
(S-4) Activity Understanding from A Cognitive Perspective: work with multidisciplinary researchers to study how the brain perceives activities.
(E) Human-Robot Interaction for Smart Hospital: work with the healthcare team (doctors and engineers) in SJTU and Ruijin Hospital to develop intelligent robots to help people.
Recruitment: Actively looking for self-motivated students (master/PhD, 2024 spring & fall), interns/engineers/visitors (CV/ML/ROB/NLP background, always welcome) to join us in Machine Vision and Intelligence Group (MVIG). If you share same/similar interests, feel free to drop me an email with your resume. Click here for more details.
2) HAKE-AVA: Human body part state (PaSta) labels in videos from AVA dataset. HAKE-AVA.
4) HAKE-3D (CVPR'20): 3D human-object representation for action understanding (DJ-RN).
5) HAKE-Object (CVPR'20, TPAMI'21): object knowledge learner to advance action understanding (SymNet).
6) HAKE-A2V (CVPR'20): Activity2Vec, a general activity feature extractor based on HAKE data, converts a human (box) to a fixed-size vector, PaSta and action scores.
8) HOI Learning List: a list of recent HOI (Human-Object Interaction) papers, code, datasets and leaderboard on widely-used benchmarks.