Publications

AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots

[ arXiv ]

COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models

[ arXiv ]

Fast-UMI: A Scalable and Hardware-Independent Universal Manipulation Interface

[ arXiv ]

Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning

[ arXiv ]

HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation

[ CVPR 2024 ]

KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance

[ CoRL 2024 ]

SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

[ ICML 2024 ]

Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning

[ ICML 2024 ]

Constrained Ensemble Exploration for Unsupervised Skill Discovery

[ ICML 2024 ]

Implicit Event-RGBD Neural SLAM

[ CVPR 2024 Highlight ]

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

[ CVPR 2024 Highlight ]

X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge Transfer

[ AAAI 2024 ]

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models

[ AAAI 2024 ]

Color Event Enhanced Single-Exposure HDR Imaging

[ AAAI 2024 ]

ASF-Transformer: neutralizing the impact of atmospheric turbulence on optical imaging through alternating learning in the spatial and frequency domains

[ Optics Express ]

AI-driven projection tomography with multicore fibre-optic cell rotation

[ Nature Communications ]

研发世界首台基于光纤光学操控的显微层析成像原理样机;首次实现光纤光控癌细胞三维旋转,完成人类白血病细胞全三维重建,推动细胞级癌症早期诊疗与靶向药物开发。

Calibration-free quantitative phase imaging in multi-core fiber endoscopes using end-to-end deep learning

[ Optics Letters ]

Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

[ ECCV 2024 ]

Large-Scale Actionless Video Pre-Training via Discrete Diffusion for Efficient Policy Learning

[ arXiv ]

Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs

[ ICRA 2024 ]

Robust quadrupedal locomotion via risk-averse policy learning

[ ICRA 2024 ]

Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning

[ NeurIPS 2023 ]

Cross-Domain Policy Adaptation via Value-Guided Data Filtering

[ NeurIPS 2023 ]

Motion-Aware Video Frame Interpolation

[ arXiv ]

Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning

[ Artificial Intelligence ]

Vehicle Perception from Satellite

[ IEEE transactions on pattern analysis and machine intelligence ]

ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance

[ ICCV 2023 ]

Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction

[ ICCV 2023 ]

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement

[ ICCV 2023 ]

Affordance-Driven Next-Best-View Planning for Robotic Grasping

[ CoRL 2023 ]

Bio-Inspired Audiovisual Multi-Representation Integration via Self-Supervised Learning

[ ACM MM 2023 ]

Behavior Contrastive Learning for Unsupervised Skill Discovery

[ ICML 2023 ]

On the Value of Myopic Behavior in Policy Reuse

[ arXiv ]

One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field

[ CVPR 2023 ]

Fully Self-Supervised Depth Estimation from Defocus Clue

[ CVPR 2023 ]

Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking

[ CVPR 2023 ]