Skip to content

Commit

Permalink
modify page
Browse files Browse the repository at this point in the history
  • Loading branch information
MasashiHamaya committed May 7, 2024
1 parent 38e302a commit 25f1fb0
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 196 deletions.
Binary file modified src/images/teaser.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/images/teaser1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
222 changes: 26 additions & 196 deletions template.yaml
Original file line number Diff line number Diff line change
@@ -1,220 +1,50 @@
organization: OMRON SINIC X
twitter: '@omron_sinicx'
title: 'MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics'
conference: IJCAI2020
title: 'Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist'
conference: ICRA2024
resources:
paper: https://arxiv.org/abs/1909.13111
code: https://github.com/omron-sinicx/multipolar
video: https://www.youtube.com/embed/adUnIj83RtU
blog: https://medium.com/sinicx/multipolar-multi-source-policy-aggregation-for-transfer-reinforcement-learning-between-diverse-bc42a152b0f5
description: explore a new challenge in transfer RL, where only a set of source policies collected under unknown diverse dynamics is available for learning a target task efficiently.
image: https://omron-sinicx.github.io/multipolar/assets/teaser.png
url: https://omron-sinicx.github.io/multipolar
speakerdeck: b7a0614c24014dcbbb121fbb9ed234cd
paper: https://arxiv.org/abs/2402.18002
code: https://github.com/omron-sinicx/symmetry-aware-pomdp
video: https://www.youtube.com/embed/fbiX0bmb5j4
description: we propose to leverage the symmetry for sample efficiency by augmenting the training data and constructing auxiliary losses to force the agent to adhere to the symmetry.
image: https://omron-sinicx.github.io/symmetry-aware-pomdp/assets/teaser.png
url: https://omron-sinicx.github.io/symmetry-aware-pomdp
authors:
- name: Mohammadamin Barekatain*
- name: Hai Nguyen
affiliation: [1, 2]
url: http://barekatain.me/
url: https://hai-h-nguyen.github.io/
position: intern
- name: Ryo Yonetani
- name: Tadashi Kozuno
affiliation: [1]
position: principal investigator
url: https://yonetaniryo.github.io/
position: senior researcher
url: https://tadashik.github.io/
- name: Cristian C. Beltran-Hernandez1
affiliation: [1]
position: senior researcher
url: https://cristianbehe.me/
- name: Masashi Hamaya
affiliation: [1]
position: senior researcher
position: principal investigator
url: https://sites.google.com/view/masashihamaya/home
# - name: Mai Nishimura
# affiliation: [1]
# url: https://denkiwakame.github.io
# - name: Asako Kanezaki
# affiliation: [2]
# url: https://kanezaki.github.io/
contact_ids: ['github', 'omron', 2] #=> github issues, [email protected], 2nd author
contact_ids: ['github', 'omron', 4] #=> github issues, [email protected], 2nd author
affiliations:
- OMRON SINIC X Corporation
- Technical University of Munich
- Northeastern University
meta:
- '* work done as an intern at OMRON SINIC X.'
bibtex: >
# arXiv version
@article{barekatain2019multipolar,
title={MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics},
author={Barekatain, Mohammadamin and Yonetani, Ryo and Hamaya, Masashi},
journal={arXiv preprint arXiv:1909.13111},
year={2019}
}
# IJCAI version
@inproceedings{barekatain2020multipolar,
title={MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics},
author={Barekatain, Mohammadamin and Yonetani, Ryo and Hamaya, Masashi},
booktitle={International Joint Conference on Artificial Intelligence (IJCAI)},
year={2020}
@article{nguyen2024symmetry,
title={Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist},
author={Nguyen, Hai and Kozuno, Tadashi and Beltran-Hernandez, Cristian C and Hamaya, Masashi},
journal={arXiv preprint arXiv:2402.18002},
year={2024}
}
overview: |
Transfer reinforcement learning (RL) aims at improving learning efficiency of an agent by exploiting knowledge from other source agents trained on relevant tasks.
However, it remains challenging to transfer knowledge between different environmental dynamics without having access to the source environments.
In this work, we explore a new challenge in transfer RL, where only a set of source policies collected under unknown diverse dynamics is available for learning a target task efficiently.
To address this problem, the proposed approach, **MULTI-source POLicy AggRegation (MULTIPOLAR)**, comprises two key techniques.
We learn to aggregate the actions provided by the source policies adaptively to maximize the target task performance.
Meanwhile, we learn an auxiliary network that predicts residuals around the aggregated actions, which ensures the target policy”s expressiveness even when some of the source policies perform poorly.
We demonstrated the effectiveness of MULTIPOLAR through an extensive experimental evaluation across six simulated environments ranging from classic control problems to challenging robotics simulations, under both continuous and discrete action spaces.
method:
- title: subsection 1
image: method.png
text: >
**test text with unicode characters:** α, β, φ, ψ
- title: subsection 2
image: null
text: >
**test text with TeX characters:** $\alpha$, $\beta$, $\phi$, $\psi \\$
see how it renders with $\KaTeX$.
$$ E = mc^2$$
$$ \int \oint \sum \prod $$
$$ \begin{CD} A @>a>> B \\ @VbVV @AAcA \\ C @= D \end{CD} $$
- title: null
image: method.png
text: >
This is a multi-line text example.
"> - Flow Style" converts newlines to spaces.
Using >, newline characters are converted to spaces.
Newline characters and indentation are handled appropriately, and the text is represented as a single line.
It's suitable when you want to collapse multi-line text into a single line, such as in configurations or descriptions where readability is key.
- text: |
This is a multi-line
text example.
"| - Block Style" preserves newlines and indentation.
Using |, you can represent multi-line text that includes newline characters.
Newline characters are preserved exactly as they are, along with the block's indentation.
It's suitable when maintaining newlines and indentation is important, such as preserving the structure of code or prose.
results:
- text: |
### Motion Planning (MP) Dataset
markdown version
|Method|Opt|Exp|Hmean|
|--|--|--|--|
|BF| 65.8 (63.8, 68.0)| 44.1 (42.8, 45.5) | 44.8 (43.4, 46.3)|
|WA*| 68.4 (66.5, 70.4)| 35.8 (34.5, 37.1) | 40.4 (39.0, 41.8)|
|**Neural A*** | **87.7 (86.6, 88.9)**| 40.1 (38.9, 41.3) | 52.0 (50.7, 53.3)|
<h3>Motion Planning (MP) Dataset</h3>
<p>HTML version</p>
<div class="uk-overflow-auto">
<table class="uk-table uk-table-small uk-text-small uk-table-divider">
<thead>
<tr>
<th>Method</th>
<th>Opt</th>
<th>Exp</th>
<th>Hmean</th>
</tr>
</thead>
<tbody>
<tr>
<td>
BF
<br />
WA*
</td>
<td>
65.8 (63.8, 68.0)
<br />
68.4 (66.5, 70.4)
</td>
<td>
44.1 (42.8, 45.5)
<br />
35.8 (34.5, 37.1)
</td>
<td>
44.8 (43.4, 46.3)
<br />
40.4 (39.0, 41.8)
</td>
</tr>
<tr>
<td>
SAIL
<br />
SAIL-SL
<br />
BB-A*
</td>
<td>
5.7 (4.6, 6.8)
<br />
3.1 (2.3, 3.8)
<br />
31.2 (28.8, 33.5)
</td>
<td>
58.0 (56.1, 60.0)
<br />
57.6 (55.7, 59.6)
<br />
52.0 (50.2, 53.9)
</td>
<td>
7.7 (6.4, 9.0)
<br />
4.4 (3.5, 5.3)
<br />
31.1 (29.2, 33.0)
</td>
</tr>
<tr>
<td>
Neural BF
<br />
<b>Neural A*</b>
</td>
<td>
75.5 (73.8, 77.1)
<br />
<b>87.7 (86.6, 88.9)</b>
</td>
<td>
45.9 (44.6, 47.2)
<br />
40.1 (38.9, 41.3)
</td>
<td>
52.0 (50.7, 53.4)
<br />
52.0 (50.7, 53.3)
</td>
</tr>
</tbody>
</table>
</div>
<h3>Selected Path Planning Results</h3>
<p>dummy text</p>
<img
src="assets/result1.png"
class="uk-align-center uk-responsive-width"
alt=""
/>
<h3>Path Planning Results on SSD Dataset</h3>
<p>dummy text</p>
<img
src="assets/result2.png"
class="uk-align-center uk-responsive-width"
alt=""
/>
demo:
- mp4: result1.mp4
text: demo text1 demo text1 demo text1
scale: 100%
- mp4: result1.mp4
text: demo text2 demo text2 demo text2
scale: 100%
- mp4: result1.mp4
text: demo text3 demo text3 demo text3
scale: 80%
This study tackles the representative yet challenging contact-rich peg-in-hole task of robotic assembly, using a soft wrist that can operate more safely and tolerate lower-frequency control signals than a rigid one. Previous studies often use a fully observable formulation, requiring external setups or estimators for the peg-to-hole pose. In contrast, we use a partially observable formulation and deep reinforcement learning from demonstrations to learn a memory-based agent that acts purely on haptic and proprioceptive signals. Moreover, previous works do not incorporate potential domain symmetry and thus must search for solutions in a bigger space. Instead, we propose to leverage the symmetry for sample efficiency by augmenting the training data and constructing auxiliary losses to force the agent to adhere to the symmetry. Results in simulation with five different symmetric peg shapes show that our proposed agent can be comparable to or even outperform a state-based agent. In particular, the sample efficiency also allows us to learn directly on the real robot within 3 hours.

0 comments on commit 25f1fb0

Please sign in to comment.