Feel free to contact me via email at liujason [at] cmu [dot] edu!
Research
I am interested in building robotic systems capable of performing dexterous, robust, human-like
skills that can generalize to
unstructrued and diverse environemnts. My research focuses on robot learning for manipulation and
perception with an emphasis on
sim-to-real.
Representative papers are highlighted.
Many contact-rich tasks humans perform, such as box pickup or rolling dough, rely on
force feedback for reliable execution. However, this force information, which is readily
available in most robot arms, is not commonly used in teleoperation and policy learning.
Consequently, robot behavior is often limited to quasi-static kinematic tasks that do not
require intricate force-feedback. In this paper, we first present a low-cost, intuitive,
bilateral teleoperation setup that relays external forces of the follower arm back to
the teacher arm, facilitating data collection for complex, contact-rich tasks. We
then introduce FACTR, a policy learning method that employs a curriculum which corrupts
the visual input with decreasing intensity throughout training. The curriculum prevents
our transformer-based policy from over-fitting to the visual input and guides the policy
to properly attend to the force modality. We demonstrate that by fully utilizing the
force information, our method significantly improves generalization to unseen objects
by 43% compared to baseline approaches without a curriculum.
@article{liu2025factr,
title={FACTR: Force-Attending Curriculum Training for
Contact-Rich Policy Learning},
author={Jason Jingzhou Liu and Yulong Li and Kenneth Shaw
and Tony Tao and Ruslan Salakhutdinov and Deepak Pathak},
journal={arXiv preprint arXiv:2502.17432},
year={2025},
}
Vision-based object detectors are a crucial basis
for robotics
applications as they provide valuable information about object localisa-
tion in the environment. These need to ensure high reliability in differ-
ent lighting conditions, occlusions, and visual artifacts, all while running
in real-time. Collecting and annotating real-world data for these net-
works is prohibitively time consuming and costly, especially for custom
assets, such as industrial objects, making it untenable for generalization
to in-the-wild scenarios. To this end, we present Synthetica, a method for
large-scale synthetic data generation for training robust state estimators.
This paper focuses on the task of object detection, an important problem
which can serve as the front-end for most state estimation problems, such
as pose estimation. Leveraging data from a photorealistic ray-tracing ren-
derer, we scale up data generation, generating 2.7 million images, to train
highly accurate real-time detection transformers. We present a collection
of rendering randomization and training-time data augmentation tech-
niques conducive to robust sim-to-real performance for vision tasks. We
demonstrate state-of-the-art performance on the task of object detec-
tion while having detectors that run at 50–100Hz which is 9 times faster
than the prior SOTA. We further demonstrate the usefulness of our
training methodology for robotics applications by showcasing a pipeline
for use in the real world with custom objects for which there do not
exist prior datasets. Our work highlights the importance of scaling syn-
thetic data generation for robust sim-to-real transfer while achieving the
fastest real-time inference speeds.
@article{singh2024syntheticalargescalesynthetic,
title={Synthetica: Large Scale Synthetic Data for
Robot Perception},
author={Ritvik Singh and Jingzhou Liu and Karl Van Wyk
and Yu-Wei Chao and Jean-Francois Lafleche and
Florian Shkurti and Nathan Ratliff and Ankur Handa},
journal={arXiv},
year={2024},
}
Physics-based simulations have accelerated
progress in robot learning for driving, manipulation, and locomotion. Yet, a fast, accurate, and
robust surgical simulation environment remains a challenge. In this paper, we present
ORBIT-Surgical, a physics-based surgical robot simulation framework with photorealistic
rendering in NVIDIA Omniverse. We provide 14 benchmark surgical tasks for the da Vinci Research
Kit (dVRK) and Smart Tissue Autonomous Robot (STAR) which represent common subtasks in surgical
training. ORBIT-Surgical leverages GPU parallelization to train reinforcement learning and
imitation learning algorithms to facilitate study of robot learning to augment human surgical
skills. ORBIT-Surgical also facilitates realistic synthetic data generation for active
perception tasks. We demonstrate ORBIT-Surgical sim-to-real transfer of learned policies onto a
physical dVRK robot.
@article{yu2024orbitsurgical,
title={ORBIT-Surgical: An Open-Simulation Framework
for Learning Surgical Augmented Dexterity},
author={Qinxi Yu and Masoud Moghani and
Karthik Dharmarajan and Vincent Schorp and
William Chung-Ho Panitch and Jingzhou Liu and
Kush Hari and Huang Huang and Mayank Mittal and
Ken Goldberg and Animesh Garg},
journal={arXiv},
year={2024},
}
Various heuristic objectives for modeling
handobject interaction have been proposed in past work. However,
due to the lack of a cohesive framework, these objectives often
possess a narrow scope of applicability and are limited by their
efficiency or accuracy. In this paper, we propose HANDYPRIORS, a unified and general pipeline
for pose estimation in
human-object interaction scenes by leveraging recent advances
in differentiable physics and rendering. Our approach employs
rendering priors to align with input images and segmentation masks along with physics priors to
mitigate penetration
and relative-sliding across frames. Furthermore, we present
two alternatives for hand and object pose estimation. The
optimization-based pose estimation achieves higher accuracy,
while the filtering-based tracking, which utilizes the differentiable priors as dynamics and
observation models, executes
faster. We demonstrate that HANDYPRIORS attains comparable
or superior results in the pose estimation task, and that the
differentiable physics module can predict contact information
for pose refinement. We also show that our approach generalizes
to perception tasks, including robotic hand manipulation and
human-object pose estimation in the wild.
@InProceedings{zhang2023handypriors,
title={HandyPriors: Physically Consistent Perception
of Hand-Object Interactions with Differentiable Priors},
author={Shutong Zhang and Yiling Qiao and Guanglei Zhu and
Eric Heiden and Dylan Turpin and Jingzhou Liu and
Ming Lin and Miles Macklin and Animesh Garg},
journal={arXiv},
year={2023},
}
DeXtreme: Transfer of Agile In-Hand Manipulation from Simulation to
Reality
Ankur Handa*,
Arthur Allshire*,
Viktor Makoviychuk*,
Aleksei Petrenko*,
Ritvik Singh*,
Jingzhou Liu*,
Denys Makoviichuk,
Karl Van Wyk,
Alexander Zhurkevich,
Balakumar Sundaralingam,
Yashraj Narang,
Jean-Francois Lafleche,
Dieter Fox,
Gavriel State
ICRA, 2023
Recent work has demonstrated the ability of deep
reinforcement learning (RL) algorithms to learn complex robotic behaviours in simulation,
including in the domain of multi-fingered manipulation. However, such models can be challenging
to transfer to the real world due to the gap between simulation and reality. In this paper, we
present our techniques to train a) a policy that can perform robust dexterous manipulation on an
anthropomorphic robot hand and b) a robust pose estimator suitable for providing reliable
real-time information on the state of the object being manipulated. Our policies are trained to
adapt to a wide range of conditions in simulation. Consequently, our vision-based policies
significantly outperform the best vision policies in the literature on the same reorientation
task and are competitive with policies that are given privileged state information via motion
capture systems. Our work reaffirms the possibilities of sim-to-real transfer for dexterous
manipulation in diverse kinds of hardware and simulator setups, and in our case, with the
Allegro Hand and Isaac Gym GPU-based simulation. Furthermore, it opens up possibilities for
researchers to achieve such results with commonly-available, affordable robot hands and
cameras.
@article{nvidia2022dextreme,
title={DeXtreme: Transfer of Agile In-Hand
Manipulation from Simulation to Reality},
author={Handa, Ankur and Allshire, Arthur and
Makoviychuk, Viktor and Petrenko, Aleksei and
Singh, Ritvik and Liu, Jingzhou and
Makoviychuk, Denys and Van Wyk, Karl and
Zhurkevich, Alexander and Sundaralingam, Balakumar
and Narang, Yashraj and Lafleche, Jean-Francois and
Fox, Dieter and State, Gavriel},
journal={arXiv},
year={2022},
}
Multi-finger grasping relies on high quality
training data, which is hard to obtain: human data is hard to transfer and synthetic data relies
on simplifying assumptions that reduce grasp quality. By making grasp simulation differentiable,
and contact dynamics amenable to gradient-based optimization, we accelerate the search for
high-quality grasps with fewer limiting assumptions. We present Grasp’D-1M: a large-scale
dataset for multi-finger robotic grasping, synthesized with Fast- Grasp’D, a novel
differentiable grasping simulator. Grasp’D- 1M contains one million training examples for three
robotic hands (three, four and five-fingered), each with multimodal visual inputs
(RGB+depth+segmentation, available in mono and stereo). Grasp synthesis with Fast-Grasp’D is 10x
faster than GraspIt! and 20x faster than the prior Grasp’D differentiable simulator. Generated
grasps are more stable and contact-rich than GraspIt! grasps, regardless of the distance
threshold used for contact generation. We validate the usefulness of our dataset by retraining
an existing vision-based grasping pipeline on Grasp’D-1M, and showing a dramatic increase in
model performance, predicting grasps with 30% more contact, a 33% higher epsilon metric, and 35%
lower simulated displacement.
@article{turpin2023fastgraspd,
title={Fast-Grasp'D: Dexterous Multi-finger Grasp
Generation Through Differentiable Simulation},
author={Dylan Turpin and Tao Zhong and Shutong Zhang and
Guanglei Zhu and Jingzhou Liu and Ritvik Singh and
Eric Heiden and Miles Macklin and Stavros Tsogkas and
Sven Dickinson and Animesh Garg},
journal={arXiv},
year={2023},
}
Many existing learning-based grasping approaches
concentrate on a single embodiment, provide limited generalization to higher DoF end-effectors
and cannot capture a diverse set of grasp modes. We tackle the problem of grasping using
multiple embodiments by learning rich geometric representations for both objects and
end-effectors using Graph Neural Networks. Our novel method - GeoMatch - applies supervised
learning on grasping data from multiple embodiments, learning end-to-end contact point
likelihood maps as well as conditional autoregressive predictions of grasps
keypoint-by-keypoint. We compare our method against baselines that support multiple embodiments.
Our approach performs better across three end-effectors, while also producing diverse
grasps.
@article{attarian2023geometry,
title={Geometry Matching for Multi-Embodiment Grasping},
author={Maria Attarian and Muhammad Adil Asif and
Jingzhou Liu and Ruthrash Hari and Animesh Garg and
Igor Gilitschenski and Jonathan Tompson},
journal={arXiv},
year={2023},
}
We present Orbit, a unified and modular framework for
robot learning powered by NVIDIA Isaac Sim. It offers a modular design to easily and efficiently
create robotic environments with photo-realistic scenes and high-fidelity rigid and deformable
body simulation. With Orbit, we provide a suite of benchmark tasks of varying difficulty -- from
single-stage cabinet opening and cloth folding to multi-stage tasks such as room reorganization.
To support working with diverse observations and action spaces, we include fixed-arm and mobile
manipulators with different physically-based sensors and motion generators. Orbit allows
training reinforcement learning policies and collecting large demonstration datasets from
hand-crafted or expert solutions in a matter of minutes by leveraging GPU-based parallelization.
In summary, we offer an open-sourced framework that readily comes with 16 robotic platforms, 4
sensor modalities, 10 motion generators, more than 20 benchmark tasks, and wrappers to 4
learning libraries. With this framework, we aim to support various research areas, including
representation learning, reinforcement learning, imitation learning, and task and motion
planning. We hope it helps establish interdisciplinary collaborations in these communities, and
its modularity makes it easily extensible for more tasks and applications in the future.
@article{mittal2023orbit,
title={Orbit: A Unified Simulation Framework for
Interactive Robot Learning Environments},
author={Mittal, Mayank and Yu, Calvin and
Yu, Qinxi and Liu, Jingzhou and Rudin, Nikita and
Hoeller, David and Yuan, Jia Lin and
Singh, Ritvik and Guo, Yunrong and
Mazhar, Hammad and Mandlekar, Ajay and
Babich, Buck and State, Gavriel and
Hutter, Marco and Garg, Animesh},
journal={IEEE Robotics and Automation Letters},
year={2023},
volume={8},
number={6},
pages={3740-3747},
doi={10.1109/LRA.2023.3270034}
}
Education
Ph.D in Robotics
School of Computer Science
Carnegie Mellon University
2024 - Present
Bachelor of Applied Science and Engineering
Engineering Science Robotics
Teleoperating a drone in a safe manner can be challenging, particularly in cluttered indoor
environments with an abundance of obstacles. We present a safe teleoperation system for
drones by performing automatic real-time dynamic obstacle avoidance, allowing us to expose a
suite of simplified high-level control primitives to the drone operator such as "fly
forward", "fly to the left", "fly up", "rotate", etc. This system reduces the complexity and
the extent of the manual controls required from drone operators to fly the drone safely. The
system accomplishes this by constructing a dynamic map of its environment in real-time and
continuously performing path-planning using the map in order to execute a collision-free
path to the desired user-specified position target.