Skip to content

Releases: mohammadzainabbas/Reinforcement-Learning-CS

Grasp - Pick-and-place with a robotic hand πŸ‘¨πŸ»β€πŸ’»

22 Jan 05:49
373c614
Compare
Choose a tag to compare

πŸ’‘ Grasp - Pick-and-place with a robotic hand πŸ‘¨πŸ»β€πŸ’»

You can see the live demo here.

Table of contents

1. πŸš€ Quickstart πŸ’»

Explore the project easily and quickly through the following colab notebooks:

2. πŸ’» Introduction πŸ‘¨πŸ»β€πŸ’»

The field of robotics has seen incredible advancements in recent years, with the development of increasingly sophisticated machines capable of performing a wide range of tasks. One area of particular interest is the ability for robots to manipulate objects in their environment, known as grasping. In this project, we have chosen to focus on a specific grasping task - training a robotic hand to pick up a moving ball object and place it in a specific target location using the Brax physics simulation engine.

Grasp – robotic hand which picks a moving ball and moves it to a specific target

The reason for choosing this project is twofold. Firstly, the ability for robots to grasp and manipulate objects is a fundamental skill that is crucial for many real-world applications, such as manufacturing, logistics, and service industries. Secondly, the use of a physics simulation engine allows us to train our robotic hand in a realistic and controlled environment, without the need for expensive hardware and the associated costs and safety concerns.

Reinforcement learning is a powerful tool for training robots to perform complex tasks, as it allows the robot to learn through trial and error. In this project, we will be using reinforcement learning techniques to train our robotic hand, and we hope to demonstrate the effectiveness of this approach in solving the grasping task.

3. 🌊 Physics Simulation Engines 🦿

The use of a physics simulation engine is essential for training a robotic hand to perform the grasping task, as it allows us to simulate the real-world physical interactions between the robot and the ball. Without a physics simulation engine, it would be difficult to accurately model the dynamics of the task, including the forces and torques required for the robotic hand to pick up the ball and move it to the target location.

In this project, we explored several different physics simulation engines, including:

Each of these engines has its own strengths and weaknesses, and we carefully considered the trade-offs between them before making a final decision.

Ultimately, we chose to use Brax due to its highly scalable and parallelizable architecture, which makes it well-suited for accelerated hardware (XLA backends such as GPUs and TPUs). This allows us to simulate the grasping task at a high level of realism and detail, while also taking advantage of the increased computational power of modern hardware to speed up the training process.

4. πŸŒͺ Environment 🦾

The grasping environment provided by Brax is a simple pick-and-place task, where a 4-fingered claw hand must pick up and move a ball to a target location. The environment is designed to simulate the physical interactions between the robotic hand and the ball, including the forces and torques required for the hand to grasp the ball and move it to the target location.

The hand is able to pick up the ball and carry it to a series of red targets. Once the ball gets close to the red target, the red target is respawned at a different random location

In the environment, the robotic hand is represented by a 4-fingered claw, which is capable of opening and closing to grasp the ball. The ball is placed in a random location at the beginning of each episode, and the target location is also randomly chosen. The goal of the robotic hand is to move the ball to the target location as quickly and efficiently as possible. For more details, check 4.2.2.

4.1. πŸ”­ Observations πŸ”

The environment observes three main bodies: the Hand, the Object, and the Target. The agent uses these observations to learn how to control the robotic hand and move the object to the target location.

  1. The Hand observation includes information about the state of the robotic hand, such as the position and orientation of the fingers, the joint angles, and the forces and torques applied to the hand. This information is used by the agent to control the hand and pick up the object.

  2. The Object observation includes information about the state of the object, such as its position, velocity, and orientation. This information is used by the agent to track the object and move it to the target location.

  3. The Target observation includes information about the target location, such as its position and orientation. This information is used by the agent to navigate the hand and the object to the target location.

When the object reaches the target location, the agent is rewarded. The agent is also given a penalty if the object falls or if the hand collides with any obstacle. The agent's goal is to maximize the reward, which means reaching the target location as quickly and efficiently as possible.

Overall, the observations provided by the Grasp environment are designed to give the agent the information it needs to learn how to control the robotic hand and move the object to the target location. The combination of the Hand, Object, and Target observations allows the agent to learn from the environment and improve its performance over time.

4.2. πŸ„β€β™‚οΈ Actions πŸ€Έβ€β™‚οΈ

The action has 19 dimensions, it’s the hand’s position and the joints’ angles, and it is normalized to the [-1, 1] as continuous values.

4.3. πŸ† Reward πŸ₯‡

The reward function is calculated using following equation:

$$\text{reward} = \text{moving to object} + \text{close to object} + \text{touching object} + 5 * \text{target hit} + \text{moving to target}$$

where,

$$\text{moving to object} : \text{small reward for moving towards the object.} \nonumber \\$$ $$\text{close to object} : \text{small reward for being close to the object.} \nonumber \\$$ $$\text{touching object} : \text{small reward for touching the object.} \nonumber \\$$ $$\text{target hit} : \text{high reward for hitting the target (max. reward).} \nonumber \\$$ $$\text{moving to target} : \text{high reward for moving towards the target.} \nonumber$$

where each minor step approaching the task completeness will be rewarded, while the $\text{target hit}$ will gain the biggest reward.

5. πŸ”¬ Algorithms πŸ’»

We will use the brax’s optimized algorithms: PPO, ES, ARS and SAC.

5.1. πŸ’‘ Proximal policy optimization (PPO) πŸ‘¨πŸ»β€πŸ’»

Proximal Policy Optimization (PPO) is a model-free online policy gradient reinforcement learning algorithm, developed at OpenAI in 2017. PPO strikes a balance between ease of implementation, sample complexity, and ease of tuning, trying to compute an update at each step that minimizes the cost function while ensur...

Read more