0.2.0
Highlights
- Using stable releases for TensorFlow (>=2.3.0), Reverb, and TensorFlow Probability.
- Added Critic Regularized Regression (code, paper)
- Added Discrete Batch-Constrained Deep Q-learning (code, paper)
- Added
EnvironmentLoop.run_episode()
for running a single episode. - Update
EnvironmentLoop.run()
to takenum_steps
, allowing the control of step count rather than just episode count. - Add more distribution types (e.g. GaussianMixture) which can be used by policies.
- Added a environment wrapper for action repeats.
- Improvements/tuning to datasets exposed by
make_dataset
. - Add support for nested / multidimensional rewards and discounts.
Minor changes and fixes
ConstantInfo
logger for logging constant information.- Added a
should_update
parameter to theEnvironmentLoop
. - Various modifications and optimizations to the
make_reverb_dataset()
function. - Improvements to typing and pytype usage.
- Other minor bug and documentation fixes.