Skip to content
View dyth's full-sized avatar
Block or Report

Block or report dyth

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dyth/README.md

I am David Yu-Tung Hui / 許宇同, and I develop model-free reinforcement learning algorithms.

My research goal is to create algorithms that learn about our world through interactions. I hope that these algorithms will one day help us discover new scientific knowledge. To achieve my goal, I work on Reinforcement Learning (RL) algorithms that formalize the learning-through-interaction process as an optimization problem. Specifically, I improve optimization in RL using insights from linear algebra and probability theory.

I've written two works on this theme:

  1. Stabilizing Q-Learning for Continuous Control (MSc Thesis, 2022) showed that adding LayerNorm to critic networks prevented semi-gradient updates of the mean-squared temporal-difference error from diverging. Adding LayerNorm to DDPG solved high-dimensional continuous control tasks such as dog-run in DeepMind Control.

  2. Double Gumbel Q-Learning (Spotlight @NeurIPS 2023) showed that Maximum-Entropy RL algorithms have two heteroscedastic Gumbel noise sources. Accounting for these noise sources improved the aggregate performance of SAC by 2x at 1M training timesteps.

In 2023, I graduated with an MSc from Mila, University of Montreal. I'm looking for opportunities where I can continue my research.

For more information about me, see:

Google Scholar

Short CV

Pinned Loading

  1. doublegum doublegum Public

    NeurIPS 2023 Spotlight

    Python 9 3

  2. mila-iqia/babyai mila-iqia/babyai Public

    BabyAI platform. A testbed for training agents to understand and execute language commands.

    Python 681 144