modify page

omron-sinicx · May 7, 2024 · 25f1fb0 · 25f1fb0
1 parent 38e302a
commit 25f1fb0
Show file tree

Hide file tree

Showing 3 changed files with 26 additions and 196 deletions.
diff --git a/src/images/teaser.png b/src/images/teaser.png
diff --git a/src/images/teaser1.png b/src/images/teaser1.png
diff --git a/template.yaml b/template.yaml
@@ -1,220 +1,50 @@
 organization: OMRON SINIC X
 twitter: '@omron_sinicx'
-title: 'MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics'
-conference: IJCAI2020
+title: 'Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist'
+conference: ICRA2024
 resources:
-  paper: https://arxiv.org/abs/1909.13111
-  code: https://github.com/omron-sinicx/multipolar
-  video: https://www.youtube.com/embed/adUnIj83RtU
-  blog: https://medium.com/sinicx/multipolar-multi-source-policy-aggregation-for-transfer-reinforcement-learning-between-diverse-bc42a152b0f5
-description: explore a new challenge in transfer RL, where only a set of source policies collected under unknown diverse dynamics is available for learning a target task efficiently.
-image: https://omron-sinicx.github.io/multipolar/assets/teaser.png
-url: https://omron-sinicx.github.io/multipolar
-speakerdeck: b7a0614c24014dcbbb121fbb9ed234cd
+  paper: https://arxiv.org/abs/2402.18002
+  code: https://github.com/omron-sinicx/symmetry-aware-pomdp
+  video: https://www.youtube.com/embed/fbiX0bmb5j4
+description: we propose to leverage the symmetry for sample efficiency by augmenting the training data and constructing auxiliary losses to force the agent to adhere to the symmetry. 
+image: https://omron-sinicx.github.io/symmetry-aware-pomdp/assets/teaser.png
+url: https://omron-sinicx.github.io/symmetry-aware-pomdp
 authors:
-  - name: Mohammadamin Barekatain*
+  - name: Hai Nguyen
     affiliation: [1, 2]
-    url: http://barekatain.me/
+    url: https://hai-h-nguyen.github.io/
     position: intern
-  - name: Ryo Yonetani
+  - name: Tadashi Kozuno
     affiliation: [1]
-    position: principal investigator
-    url: https://yonetaniryo.github.io/
+    position: senior researcher 
+    url: https://tadashik.github.io/
+  - name: Cristian C. Beltran-Hernandez1
+    affiliation: [1]
+    position: senior researcher 
+    url: https://cristianbehe.me/
   - name: Masashi Hamaya
     affiliation: [1]
-    position: senior researcher
+    position: principal investigator
     url: https://sites.google.com/view/masashihamaya/home
     #  - name: Mai Nishimura
     #    affiliation: [1]
     #    url: https://denkiwakame.github.io
     #  - name: Asako Kanezaki
     #    affiliation: [2]
     #    url: https://kanezaki.github.io/
-contact_ids: ['github', 'omron', 2] #=> github issues, [email protected], 2nd author
+contact_ids: ['github', 'omron', 4] #=> github issues, [email protected], 2nd author
 affiliations:
   - OMRON SINIC X Corporation
-  - Technical University of Munich
+  - Northeastern University 
 meta:
   - '* work done as an intern at OMRON SINIC X.'
 bibtex: >
-  # arXiv version
-
-  @article{barekatain2019multipolar,
-    title={MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics},
-    author={Barekatain, Mohammadamin and Yonetani, Ryo and Hamaya, Masashi},
-    journal={arXiv preprint arXiv:1909.13111},
-    year={2019}
-  }
-
-  # IJCAI version
-
-  @inproceedings{barekatain2020multipolar,
-    title={MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics},
-    author={Barekatain, Mohammadamin and Yonetani, Ryo and Hamaya, Masashi},
-    booktitle={International Joint Conference on Artificial Intelligence (IJCAI)},
-    year={2020}
+  @article{nguyen2024symmetry,
+    title={Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist},
+    author={Nguyen, Hai and Kozuno, Tadashi and Beltran-Hernandez, Cristian C and Hamaya, Masashi},
+    journal={arXiv preprint arXiv:2402.18002},
+    year={2024}
   }
 
 overview: |
-  Transfer reinforcement learning (RL) aims at improving learning efficiency of an agent by exploiting knowledge from other source agents trained on relevant tasks.
-  However, it remains challenging to transfer knowledge between different environmental dynamics without having access to the source environments.
-  In this work, we explore a new challenge in transfer RL, where only a set of source policies collected under unknown diverse dynamics is available for learning a target task efficiently.
-  To address this problem, the proposed approach, **MULTI-source POLicy AggRegation (MULTIPOLAR)**, comprises two key techniques.
-  We learn to aggregate the actions provided by the source policies adaptively to maximize the target task performance.
-  Meanwhile, we learn an auxiliary network that predicts residuals around the aggregated actions, which ensures the target policy”s expressiveness even when some of the source policies perform poorly.
-  We demonstrated the effectiveness of MULTIPOLAR through an extensive experimental evaluation across six simulated environments ranging from classic control problems to challenging robotics simulations, under both continuous and discrete action spaces.
-
-method:
-  - title: subsection 1
-    image: method.png
-    text: >
-      **test text with unicode characters:** α, β, φ, ψ
-  - title: subsection 2
-    image: null
-    text: >
-      **test text with TeX characters:** $\alpha$, $\beta$, $\phi$, $\psi \\$
-      see how it renders with $\KaTeX$.
-      $$ E = mc^2$$
-      $$ \int \oint \sum \prod $$
-      $$ \begin{CD} A @>a>> B \\ @VbVV @AAcA \\ C @= D \end{CD} $$
-  - title: null
-    image: method.png
-    text: >
-      This is a multi-line text example.
-      "> - Flow Style" converts newlines to spaces.
-      Using >, newline characters are converted to spaces.
-      Newline characters and indentation are handled appropriately, and the text is represented as a single line.
-      It's suitable when you want to collapse multi-line text into a single line, such as in configurations or descriptions where readability is key.
-  - text: |
-      This is a multi-line
-      text example.
-      "| - Block Style" preserves newlines and indentation.
-      Using |, you can represent multi-line text that includes newline characters.
-      Newline characters are preserved exactly as they are, along with the block's indentation.
-      It's suitable when maintaining newlines and indentation is important, such as preserving the structure of code or prose.
-
-results:
-  - text: |
-      ### Motion Planning (MP) Dataset
-      markdown version
-      |Method|Opt|Exp|Hmean|
-      |--|--|--|--|
-      |BF| 65.8 (63.8, 68.0)| 44.1 (42.8, 45.5) | 44.8 (43.4, 46.3)|
-      |WA*| 68.4 (66.5, 70.4)| 35.8 (34.5, 37.1) | 40.4 (39.0, 41.8)|
-      |**Neural A*** | **87.7 (86.6, 88.9)**| 40.1 (38.9, 41.3) | 52.0 (50.7, 53.3)|
-
-      <h3>Motion Planning (MP) Dataset</h3>
-      <p>HTML version</p>
-      <div class="uk-overflow-auto">
-        <table class="uk-table uk-table-small uk-text-small uk-table-divider">
-          <thead>
-            <tr>
-              <th>Method</th>
-              <th>Opt</th>
-              <th>Exp</th>
-              <th>Hmean</th>
-            </tr>
-          </thead>
-          <tbody>
-            <tr>
-              <td>
-                BF
-                <br />
-                WA*
-              </td>
-              <td>
-                65.8 (63.8, 68.0)
-                <br />
-                68.4 (66.5, 70.4)
-              </td>
-              <td>
-                44.1 (42.8, 45.5)
-                <br />
-                35.8 (34.5, 37.1)
-              </td>
-              <td>
-                44.8 (43.4, 46.3)
-                <br />
-                40.4 (39.0, 41.8)
-              </td>
-            </tr>
-            <tr>
-              <td>
-                SAIL
-                <br />
-                SAIL-SL
-                <br />
-                BB-A*
-              </td>
-              <td>
-                5.7 (4.6, 6.8)
-                <br />
-                3.1 (2.3, 3.8)
-                <br />
-                31.2 (28.8, 33.5)
-              </td>
-              <td>
-                58.0 (56.1, 60.0)
-                <br />
-                57.6 (55.7, 59.6)
-                <br />
-                52.0 (50.2, 53.9)
-              </td>
-              <td>
-                7.7 (6.4, 9.0)
-                <br />
-                4.4 (3.5, 5.3)
-                <br />
-                31.1 (29.2, 33.0)
-              </td>
-            </tr>
-            <tr>
-              <td>
-                Neural BF
-                <br />
-                <b>Neural A*</b>
-              </td>
-              <td>
-                75.5 (73.8, 77.1)
-                <br />
-                <b>87.7 (86.6, 88.9)</b>
-              </td>
-              <td>
-                45.9 (44.6, 47.2)
-                <br />
-                40.1 (38.9, 41.3)
-              </td>
-              <td>
-                52.0 (50.7, 53.4)
-                <br />
-                52.0 (50.7, 53.3)
-              </td>
-            </tr>
-          </tbody>
-        </table>
-      </div>
-      <h3>Selected Path Planning Results</h3>
-      <p>dummy text</p>
-      <img
-        src="assets/result1.png"
-        class="uk-align-center uk-responsive-width"
-        alt=""
-      />
-      <h3>Path Planning Results on SSD Dataset</h3>
-      <p>dummy text</p>
-      <img
-        src="assets/result2.png"
-        class="uk-align-center uk-responsive-width"
-        alt=""
-      />
-
-demo:
-  - mp4: result1.mp4
-    text: demo text1 demo text1 demo text1
-    scale: 100%
-  - mp4: result1.mp4
-    text: demo text2 demo text2 demo text2
-    scale: 100%
-  - mp4: result1.mp4
-    text: demo text3 demo text3 demo text3
-    scale: 80%
+  This study tackles the representative yet challenging contact-rich peg-in-hole task of robotic assembly, using a soft wrist that can operate more safely and tolerate lower-frequency control signals than a rigid one. Previous studies often use a fully observable formulation, requiring external setups or estimators for the peg-to-hole pose. In contrast, we use a partially observable formulation and deep reinforcement learning from demonstrations to learn a memory-based agent that acts purely on haptic and proprioceptive signals. Moreover, previous works do not incorporate potential domain symmetry and thus must search for solutions in a bigger space. Instead, we propose to leverage the symmetry for sample efficiency by augmenting the training data and constructing auxiliary losses to force the agent to adhere to the symmetry. Results in simulation with five different symmetric peg shapes show that our proposed agent can be comparable to or even outperform a state-based agent. In particular, the sample efficiency also allows us to learn directly on the real robot within 3 hours.