-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
38e302a
commit 25f1fb0
Showing
3 changed files
with
26 additions
and
196 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,220 +1,50 @@ | ||
organization: OMRON SINIC X | ||
twitter: '@omron_sinicx' | ||
title: 'MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics' | ||
conference: IJCAI2020 | ||
title: 'Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist' | ||
conference: ICRA2024 | ||
resources: | ||
paper: https://arxiv.org/abs/1909.13111 | ||
code: https://github.com/omron-sinicx/multipolar | ||
video: https://www.youtube.com/embed/adUnIj83RtU | ||
blog: https://medium.com/sinicx/multipolar-multi-source-policy-aggregation-for-transfer-reinforcement-learning-between-diverse-bc42a152b0f5 | ||
description: explore a new challenge in transfer RL, where only a set of source policies collected under unknown diverse dynamics is available for learning a target task efficiently. | ||
image: https://omron-sinicx.github.io/multipolar/assets/teaser.png | ||
url: https://omron-sinicx.github.io/multipolar | ||
speakerdeck: b7a0614c24014dcbbb121fbb9ed234cd | ||
paper: https://arxiv.org/abs/2402.18002 | ||
code: https://github.com/omron-sinicx/symmetry-aware-pomdp | ||
video: https://www.youtube.com/embed/fbiX0bmb5j4 | ||
description: we propose to leverage the symmetry for sample efficiency by augmenting the training data and constructing auxiliary losses to force the agent to adhere to the symmetry. | ||
image: https://omron-sinicx.github.io/symmetry-aware-pomdp/assets/teaser.png | ||
url: https://omron-sinicx.github.io/symmetry-aware-pomdp | ||
authors: | ||
- name: Mohammadamin Barekatain* | ||
- name: Hai Nguyen | ||
affiliation: [1, 2] | ||
url: http://barekatain.me/ | ||
url: https://hai-h-nguyen.github.io/ | ||
position: intern | ||
- name: Ryo Yonetani | ||
- name: Tadashi Kozuno | ||
affiliation: [1] | ||
position: principal investigator | ||
url: https://yonetaniryo.github.io/ | ||
position: senior researcher | ||
url: https://tadashik.github.io/ | ||
- name: Cristian C. Beltran-Hernandez1 | ||
affiliation: [1] | ||
position: senior researcher | ||
url: https://cristianbehe.me/ | ||
- name: Masashi Hamaya | ||
affiliation: [1] | ||
position: senior researcher | ||
position: principal investigator | ||
url: https://sites.google.com/view/masashihamaya/home | ||
# - name: Mai Nishimura | ||
# affiliation: [1] | ||
# url: https://denkiwakame.github.io | ||
# - name: Asako Kanezaki | ||
# affiliation: [2] | ||
# url: https://kanezaki.github.io/ | ||
contact_ids: ['github', 'omron', 2] #=> github issues, [email protected], 2nd author | ||
contact_ids: ['github', 'omron', 4] #=> github issues, [email protected], 2nd author | ||
affiliations: | ||
- OMRON SINIC X Corporation | ||
- Technical University of Munich | ||
- Northeastern University | ||
meta: | ||
- '* work done as an intern at OMRON SINIC X.' | ||
bibtex: > | ||
# arXiv version | ||
@article{barekatain2019multipolar, | ||
title={MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics}, | ||
author={Barekatain, Mohammadamin and Yonetani, Ryo and Hamaya, Masashi}, | ||
journal={arXiv preprint arXiv:1909.13111}, | ||
year={2019} | ||
} | ||
# IJCAI version | ||
@inproceedings{barekatain2020multipolar, | ||
title={MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics}, | ||
author={Barekatain, Mohammadamin and Yonetani, Ryo and Hamaya, Masashi}, | ||
booktitle={International Joint Conference on Artificial Intelligence (IJCAI)}, | ||
year={2020} | ||
@article{nguyen2024symmetry, | ||
title={Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist}, | ||
author={Nguyen, Hai and Kozuno, Tadashi and Beltran-Hernandez, Cristian C and Hamaya, Masashi}, | ||
journal={arXiv preprint arXiv:2402.18002}, | ||
year={2024} | ||
} | ||
overview: | | ||
Transfer reinforcement learning (RL) aims at improving learning efficiency of an agent by exploiting knowledge from other source agents trained on relevant tasks. | ||
However, it remains challenging to transfer knowledge between different environmental dynamics without having access to the source environments. | ||
In this work, we explore a new challenge in transfer RL, where only a set of source policies collected under unknown diverse dynamics is available for learning a target task efficiently. | ||
To address this problem, the proposed approach, **MULTI-source POLicy AggRegation (MULTIPOLAR)**, comprises two key techniques. | ||
We learn to aggregate the actions provided by the source policies adaptively to maximize the target task performance. | ||
Meanwhile, we learn an auxiliary network that predicts residuals around the aggregated actions, which ensures the target policy”s expressiveness even when some of the source policies perform poorly. | ||
We demonstrated the effectiveness of MULTIPOLAR through an extensive experimental evaluation across six simulated environments ranging from classic control problems to challenging robotics simulations, under both continuous and discrete action spaces. | ||
method: | ||
- title: subsection 1 | ||
image: method.png | ||
text: > | ||
**test text with unicode characters:** α, β, φ, ψ | ||
- title: subsection 2 | ||
image: null | ||
text: > | ||
**test text with TeX characters:** $\alpha$, $\beta$, $\phi$, $\psi \\$ | ||
see how it renders with $\KaTeX$. | ||
$$ E = mc^2$$ | ||
$$ \int \oint \sum \prod $$ | ||
$$ \begin{CD} A @>a>> B \\ @VbVV @AAcA \\ C @= D \end{CD} $$ | ||
- title: null | ||
image: method.png | ||
text: > | ||
This is a multi-line text example. | ||
"> - Flow Style" converts newlines to spaces. | ||
Using >, newline characters are converted to spaces. | ||
Newline characters and indentation are handled appropriately, and the text is represented as a single line. | ||
It's suitable when you want to collapse multi-line text into a single line, such as in configurations or descriptions where readability is key. | ||
- text: | | ||
This is a multi-line | ||
text example. | ||
"| - Block Style" preserves newlines and indentation. | ||
Using |, you can represent multi-line text that includes newline characters. | ||
Newline characters are preserved exactly as they are, along with the block's indentation. | ||
It's suitable when maintaining newlines and indentation is important, such as preserving the structure of code or prose. | ||
results: | ||
- text: | | ||
### Motion Planning (MP) Dataset | ||
markdown version | ||
|Method|Opt|Exp|Hmean| | ||
|--|--|--|--| | ||
|BF| 65.8 (63.8, 68.0)| 44.1 (42.8, 45.5) | 44.8 (43.4, 46.3)| | ||
|WA*| 68.4 (66.5, 70.4)| 35.8 (34.5, 37.1) | 40.4 (39.0, 41.8)| | ||
|**Neural A*** | **87.7 (86.6, 88.9)**| 40.1 (38.9, 41.3) | 52.0 (50.7, 53.3)| | ||
<h3>Motion Planning (MP) Dataset</h3> | ||
<p>HTML version</p> | ||
<div class="uk-overflow-auto"> | ||
<table class="uk-table uk-table-small uk-text-small uk-table-divider"> | ||
<thead> | ||
<tr> | ||
<th>Method</th> | ||
<th>Opt</th> | ||
<th>Exp</th> | ||
<th>Hmean</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr> | ||
<td> | ||
BF | ||
<br /> | ||
WA* | ||
</td> | ||
<td> | ||
65.8 (63.8, 68.0) | ||
<br /> | ||
68.4 (66.5, 70.4) | ||
</td> | ||
<td> | ||
44.1 (42.8, 45.5) | ||
<br /> | ||
35.8 (34.5, 37.1) | ||
</td> | ||
<td> | ||
44.8 (43.4, 46.3) | ||
<br /> | ||
40.4 (39.0, 41.8) | ||
</td> | ||
</tr> | ||
<tr> | ||
<td> | ||
SAIL | ||
<br /> | ||
SAIL-SL | ||
<br /> | ||
BB-A* | ||
</td> | ||
<td> | ||
5.7 (4.6, 6.8) | ||
<br /> | ||
3.1 (2.3, 3.8) | ||
<br /> | ||
31.2 (28.8, 33.5) | ||
</td> | ||
<td> | ||
58.0 (56.1, 60.0) | ||
<br /> | ||
57.6 (55.7, 59.6) | ||
<br /> | ||
52.0 (50.2, 53.9) | ||
</td> | ||
<td> | ||
7.7 (6.4, 9.0) | ||
<br /> | ||
4.4 (3.5, 5.3) | ||
<br /> | ||
31.1 (29.2, 33.0) | ||
</td> | ||
</tr> | ||
<tr> | ||
<td> | ||
Neural BF | ||
<br /> | ||
<b>Neural A*</b> | ||
</td> | ||
<td> | ||
75.5 (73.8, 77.1) | ||
<br /> | ||
<b>87.7 (86.6, 88.9)</b> | ||
</td> | ||
<td> | ||
45.9 (44.6, 47.2) | ||
<br /> | ||
40.1 (38.9, 41.3) | ||
</td> | ||
<td> | ||
52.0 (50.7, 53.4) | ||
<br /> | ||
52.0 (50.7, 53.3) | ||
</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
</div> | ||
<h3>Selected Path Planning Results</h3> | ||
<p>dummy text</p> | ||
<img | ||
src="assets/result1.png" | ||
class="uk-align-center uk-responsive-width" | ||
alt="" | ||
/> | ||
<h3>Path Planning Results on SSD Dataset</h3> | ||
<p>dummy text</p> | ||
<img | ||
src="assets/result2.png" | ||
class="uk-align-center uk-responsive-width" | ||
alt="" | ||
/> | ||
demo: | ||
- mp4: result1.mp4 | ||
text: demo text1 demo text1 demo text1 | ||
scale: 100% | ||
- mp4: result1.mp4 | ||
text: demo text2 demo text2 demo text2 | ||
scale: 100% | ||
- mp4: result1.mp4 | ||
text: demo text3 demo text3 demo text3 | ||
scale: 80% | ||
This study tackles the representative yet challenging contact-rich peg-in-hole task of robotic assembly, using a soft wrist that can operate more safely and tolerate lower-frequency control signals than a rigid one. Previous studies often use a fully observable formulation, requiring external setups or estimators for the peg-to-hole pose. In contrast, we use a partially observable formulation and deep reinforcement learning from demonstrations to learn a memory-based agent that acts purely on haptic and proprioceptive signals. Moreover, previous works do not incorporate potential domain symmetry and thus must search for solutions in a bigger space. Instead, we propose to leverage the symmetry for sample efficiency by augmenting the training data and constructing auxiliary losses to force the agent to adhere to the symmetry. Results in simulation with five different symmetric peg shapes show that our proposed agent can be comparable to or even outperform a state-based agent. In particular, the sample efficiency also allows us to learn directly on the real robot within 3 hours. |