Feel free to contact me via email at liujason [at] cmu [dot] edu!
Research
I am interested in building robotic systems capable of performing dexterous, robust, human-like skills that can generalize to
unstructrued and diverse environemnts. My research focuses on robot learning for manipulation and perception with an emphasis on
sim-to-real.
Vision-based object detectors are a crucial basis for robotics
applications as they provide valuable information about object localisa-
tion in the environment. These need to ensure high reliability in differ-
ent lighting conditions, occlusions, and visual artifacts, all while running
in real-time. Collecting and annotating real-world data for these net-
works is prohibitively time consuming and costly, especially for custom
assets, such as industrial objects, making it untenable for generalization
to in-the-wild scenarios. To this end, we present Synthetica, a method for
large-scale synthetic data generation for training robust state estimators.
This paper focuses on the task of object detection, an important problem
which can serve as the front-end for most state estimation problems, such
as pose estimation. Leveraging data from a photorealistic ray-tracing ren-
derer, we scale up data generation, generating 2.7 million images, to train
highly accurate real-time detection transformers. We present a collection
of rendering randomization and training-time data augmentation tech-
niques conducive to robust sim-to-real performance for vision tasks. We
demonstrate state-of-the-art performance on the task of object detec-
tion while having detectors that run at 50–100Hz which is 9 times faster
than the prior SOTA. We further demonstrate the usefulness of our
training methodology for robotics applications by showcasing a pipeline
for use in the real world with custom objects for which there do not
exist prior datasets. Our work highlights the importance of scaling syn-
thetic data generation for robust sim-to-real transfer while achieving the
fastest real-time inference speeds.
Physics-based simulations have accelerated progress in robot learning for driving, manipulation, and locomotion. Yet, a fast, accurate, and robust surgical simulation environment remains a challenge. In this paper, we present ORBIT-Surgical, a physics-based surgical robot simulation framework with photorealistic rendering in NVIDIA Omniverse. We provide 14 benchmark surgical tasks for the da Vinci Research Kit (dVRK) and Smart Tissue Autonomous Robot (STAR) which represent common subtasks in surgical training. ORBIT-Surgical leverages GPU parallelization to train reinforcement learning and imitation learning algorithms to facilitate study of robot learning to augment human surgical skills. ORBIT-Surgical also facilitates realistic synthetic data generation for active perception tasks. We demonstrate ORBIT-Surgical sim-to-real transfer of learned policies onto a physical dVRK robot.
@article{yu2024orbitsurgical,
title={ORBIT-Surgical: An Open-Simulation Framework
for Learning Surgical Augmented Dexterity},
author={Qinxi Yu and Masoud Moghani and
Karthik Dharmarajan and Vincent Schorp and
William Chung-Ho Panitch and Jingzhou Liu and
Kush Hari and Huang Huang and Mayank Mittal and
Ken Goldberg and Animesh Garg},
journal={arXiv},
year={2024},
}
Various heuristic objectives for modeling handobject interaction have been proposed in past work. However,
due to the lack of a cohesive framework, these objectives often
possess a narrow scope of applicability and are limited by their
efficiency or accuracy. In this paper, we propose HANDYPRIORS, a unified and general pipeline for pose estimation in
human-object interaction scenes by leveraging recent advances
in differentiable physics and rendering. Our approach employs
rendering priors to align with input images and segmentation masks along with physics priors to mitigate penetration
and relative-sliding across frames. Furthermore, we present
two alternatives for hand and object pose estimation. The
optimization-based pose estimation achieves higher accuracy,
while the filtering-based tracking, which utilizes the differentiable priors as dynamics and observation models, executes
faster. We demonstrate that HANDYPRIORS attains comparable
or superior results in the pose estimation task, and that the
differentiable physics module can predict contact information
for pose refinement. We also show that our approach generalizes
to perception tasks, including robotic hand manipulation and
human-object pose estimation in the wild.
@InProceedings{zhang2023handypriors,
title={HandyPriors: Physically Consistent Perception
of Hand-Object Interactions with Differentiable Priors},
author={Shutong Zhang and Yiling Qiao and Guanglei Zhu and
Eric Heiden and Dylan Turpin and Jingzhou Liu and
Ming Lin and Miles Macklin and Animesh Garg},
journal={arXiv},
year={2023},
}
DeXtreme: Transfer of Agile In-Hand Manipulation from Simulation to Reality
Ankur Handa*,
Arthur Allshire*,
Viktor Makoviychuk*,
Aleksei Petrenko*,
Ritvik Singh*,
Jingzhou Liu*,
Denys Makoviichuk,
Karl Van Wyk,
Alexander Zhurkevich,
Balakumar Sundaralingam,
Yashraj Narang,
Jean-Francois Lafleche,
Dieter Fox,
Gavriel State
ICRA, 2023
Recent work has demonstrated the ability of deep reinforcement learning (RL) algorithms to learn complex robotic behaviours in simulation, including in the domain of multi-fingered manipulation. However, such models can be challenging to transfer to the real world due to the gap between simulation and reality. In this paper, we present our techniques to train a) a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand and b) a robust pose estimator suitable for providing reliable real-time information on the state of the object being manipulated. Our policies are trained to adapt to a wide range of conditions in simulation. Consequently, our vision-based policies significantly outperform the best vision policies in the literature on the same reorientation task and are competitive with policies that are given privileged state information via motion capture systems. Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups, and in our case, with the Allegro Hand and Isaac Gym GPU-based simulation. Furthermore, it opens up possibilities for researchers to achieve such results with commonly-available, affordable robot hands and cameras.
@article{nvidia2022dextreme,
title={DeXtreme: Transfer of Agile In-Hand
Manipulation from Simulation to Reality},
author={Handa, Ankur and Allshire, Arthur and
Makoviychuk, Viktor and Petrenko, Aleksei and
Singh, Ritvik and Liu, Jingzhou and
Makoviychuk, Denys and Van Wyk, Karl and
Zhurkevich, Alexander and Sundaralingam, Balakumar
and Narang, Yashraj and Lafleche, Jean-Francois and
Fox, Dieter and State, Gavriel},
journal={arXiv},
year={2022},
}
Multi-finger grasping relies on high quality training data, which is hard to obtain: human data is hard to transfer and synthetic data relies on simplifying assumptions that reduce grasp quality. By making grasp simulation differentiable, and contact dynamics amenable to gradient-based optimization, we accelerate the search for high-quality grasps with fewer limiting assumptions. We present Grasp’D-1M: a large-scale dataset for multi-finger robotic grasping, synthesized with Fast- Grasp’D, a novel differentiable grasping simulator. Grasp’D- 1M contains one million training examples for three robotic hands (three, four and five-fingered), each with multimodal visual inputs (RGB+depth+segmentation, available in mono and stereo). Grasp synthesis with Fast-Grasp’D is 10x faster than GraspIt! and 20x faster than the prior Grasp’D differentiable simulator. Generated grasps are more stable and contact-rich than GraspIt! grasps, regardless of the distance threshold used for contact generation. We validate the usefulness of our dataset by retraining an existing vision-based grasping pipeline on Grasp’D-1M, and showing a dramatic increase in model performance, predicting grasps with 30% more contact, a 33% higher epsilon metric, and 35% lower simulated displacement.
@article{turpin2023fastgraspd,
title={Fast-Grasp'D: Dexterous Multi-finger Grasp
Generation Through Differentiable Simulation},
author={Dylan Turpin and Tao Zhong and Shutong Zhang and
Guanglei Zhu and Jingzhou Liu and Ritvik Singh and
Eric Heiden and Miles Macklin and Stavros Tsogkas and
Sven Dickinson and Animesh Garg},
journal={arXiv},
year={2023},
}
Many existing learning-based grasping approaches concentrate on a single embodiment, provide limited generalization to higher DoF end-effectors and cannot capture a diverse set of grasp modes. We tackle the problem of grasping using multiple embodiments by learning rich geometric representations for both objects and end-effectors using Graph Neural Networks. Our novel method - GeoMatch - applies supervised learning on grasping data from multiple embodiments, learning end-to-end contact point likelihood maps as well as conditional autoregressive predictions of grasps keypoint-by-keypoint. We compare our method against baselines that support multiple embodiments. Our approach performs better across three end-effectors, while also producing diverse grasps.
@article{attarian2023geometry,
title={Geometry Matching for Multi-Embodiment Grasping},
author={Maria Attarian and Muhammad Adil Asif and
Jingzhou Liu and Ruthrash Hari and Animesh Garg and
Igor Gilitschenski and Jonathan Tompson},
journal={arXiv},
year={2023},
}
We present Orbit, a unified and modular framework for robot learning powered by NVIDIA Isaac Sim. It offers a modular design to easily and efficiently create robotic environments with photo-realistic scenes and high-fidelity rigid and deformable body simulation. With Orbit, we provide a suite of benchmark tasks of varying difficulty -- from single-stage cabinet opening and cloth folding to multi-stage tasks such as room reorganization. To support working with diverse observations and action spaces, we include fixed-arm and mobile manipulators with different physically-based sensors and motion generators. Orbit allows training reinforcement learning policies and collecting large demonstration datasets from hand-crafted or expert solutions in a matter of minutes by leveraging GPU-based parallelization. In summary, we offer an open-sourced framework that readily comes with 16 robotic platforms, 4 sensor modalities, 10 motion generators, more than 20 benchmark tasks, and wrappers to 4 learning libraries. With this framework, we aim to support various research areas, including representation learning, reinforcement learning, imitation learning, and task and motion planning. We hope it helps establish interdisciplinary collaborations in these communities, and its modularity makes it easily extensible for more tasks and applications in the future.
@article{mittal2023orbit,
title={Orbit: A Unified Simulation Framework for
Interactive Robot Learning Environments},
author={Mittal, Mayank and Yu, Calvin and
Yu, Qinxi and Liu, Jingzhou and Rudin, Nikita and
Hoeller, David and Yuan, Jia Lin and
Singh, Ritvik and Guo, Yunrong and
Mazhar, Hammad and Mandlekar, Ajay and
Babich, Buck and State, Gavriel and
Hutter, Marco and Garg, Animesh},
journal={IEEE Robotics and Automation Letters},
year={2023},
volume={8},
number={6},
pages={3740-3747},
doi={10.1109/LRA.2023.3270034}
}
Education
Ph.D in Robotics
School of Computer Science
Carnegie Mellon University
2024 - Present
Bachelor of Applied Science and Engineering
Engineering Science Robotics
Teleoperating a drone in a safe manner can be challenging, particularly in cluttered indoor environments with an abundance of obstacles. We present a safe teleoperation system for drones by performing automatic real-time dynamic obstacle avoidance, allowing us to expose a suite of simplified high-level control primitives to the drone operator such as "fly forward", "fly to the left", "fly up", "rotate", etc. This system reduces the complexity and the extent of the manual controls required from drone operators to fly the drone safely. The system accomplishes this by constructing a dynamic map of its environment in real-time and continuously performing path-planning using the map in order to execute a collision-free path to the desired user-specified position target.