Now I'm working closely with Prof. Chi Keung Tang and Yu-Wing Tai at the Hong Kong University of Science and Technology (HKUST) as a postdoc fellow (2021-present). I received a Ph.D. degree (2017-2021) in Computer Science from the Shanghai Jiao Tong University (SJTU), under the supervision of Prof. Cewu Lu, in Machine Vision and Intelligence Group (MVIG). Prior to that, I worked and studied at the Institute of Automation, Chinese Academy of Sciences (CASIA) under the supervision of Prof. Yiping Yang and A/Prof. Yinghao Cai. My primary research interests are Machine Learning, Computer Vision, and Intelligent Robot. Now we are building HAKE, a knowledge-driven system that enables intelligent agents to perceive human activities, reason human behavior logics, learn skills from human activities and interact with objects and environments. Check out the HAKE site for more information.
Research interests:
(1) Embodied AI: how to make agents learn skills from humans and interact with humans.
(2) Human Activity Understanding: how to learn and ground complex/ambiguous human activity concepts (body motion, human-object/human/scene interaction) and object concepts from multi-modal information (2D-3D-4D).
(3) Visual Reasoning: how to mine, capture, and embed the logics and causal relations from human activities.
(4) Activity Understanding from A Cognitive Perspective: work with multidisciplinary researchers to study how the brain perceives activities.
(5) General Visual Fundamental Model: especially for human-centric perception tasks.
Recruitment: I am actively looking for self-motivated interns, researchers, and engineers (with CV/ML/ROB/NLP background) to join our team (onsite or remote). If you share same/similar interests, feel free to drop me an email with your resume.
Preprint HAKE 1.0 [arXiv] [PDF] [Project] [Code]
Main Repo:
Sub-repos: Torch TF Halpe List
TPAMI 2021 [arXiv] [PDF] [Code]
An extension of our CVPR 2019 work (Transferable Interactiveness Network, TIN).
NeurIPS 2020 [arXiv] [PDF] [Code] [Project: HAKE-Action-Torch]
CVPR 2020 [arXiv] [PDF] [Video] [Slides] [Data] [Code]
Oral Talk, Compositionality in Computer Vision in CVPR 2020.
CVPR 2020 [arXiv] [PDF] [Video] [Slides] [Benchmark: Ambiguous-HOI] [Code]
ECCV 2018 [arXiv] [PDF] [Dataset](Instance-60k & 3D Object Models) [Code]
CVPR 2018 [PDF]
ICPR 2016 [PDF]
Contents:
1) HAKE-Image (CVPR'18/20): Human body part state (PaSta) labels in images. HAKE-HICO, HAKE-HICO-DET, HAKE-Large, Extra-40-verbs.
2) HAKE-AVA: Human body part state (PaSta) labels in videos from AVA dataset. HAKE-AVA.
3) HAKE-Action-TF, HAKE-Action-Torch (CVPR'18/19/20, NeurIPS'20, TPAMI'21): SOTA action understanding methods and the corresponding HAKE-enhanced versions (TIN, IDN).
4) HAKE-3D (CVPR'20): 3D human-object representation for action understanding (DJ-RN).
5) HAKE-Object (CVPR'20, TPAMI'21): object knowledge learner to advance action understanding (SymNet).
6) HAKE-A2V (CVPR'20): Activity2Vec, a general activity feature extractor based on HAKE data, converts a human (box) to a fixed-size vector, PaSta and action scores.
7) Halpe: a joint project under Alphapose and HAKE, full-body human keypoints (body, face, hand, 136 points) of 50,000 HOI images.
8) HOI Learning List: a list of recent HOI (Human-Object Interaction) papers, code, datasets and leaderboard on widely-used benchmarks.