Dịch `on/off policy` #164

cuongvng · 2022-07-01T05:07:20Z

cuongvng
Jul 1, 2022
Collaborator

@mlbvn/handson-ml mn cho ý kiến ạ.

Jul 3, 2022

The Q-Learning algorithm is called an off-policy algorithm because the policy being
trained is not necessarily the one being executed: in the previous code example, the
policy being executed (the exploration policy) is completely random, while the policy
being trained will always choose the actions with the highest Q-Values. Conversely,
the Policy Gradients algorithm is an on-policy algorithm: it explores the world using
the policy being trained.

"(không) dựa trên chính sách" chăng?

View full answer

minhduc0711 · 2022-07-03T11:55:32Z

minhduc0711
Jul 3, 2022
Collaborator

The Q-Learning algorithm is called an off-policy algorithm because the policy being
trained is not necessarily the one being executed: in the previous code example, the
policy being executed (the exploration policy) is completely random, while the policy
being trained will always choose the actions with the highest Q-Values. Conversely,
the Policy Gradients algorithm is an on-policy algorithm: it explores the world using
the policy being trained.

"(không) dựa trên chính sách" chăng?

3 replies

cuongvng Jul 3, 2022
Collaborator Author

not necessarily tức là có thể vẫn "dựa trên" 🤔 🤔

minhduc0711 Jul 5, 2022
Collaborator

Hmm vậy thì off-policy là tập con của on-policy à? Nghe có vẻ k đúng lắm..
Mình xem định nghĩa ở sách Sutton-Barto thì có vẻ phân biệt trắng đen luôn:

cuongvng Jul 7, 2022
Collaborator Author

vậy cứ chốt cách dịch trên ạ.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dịch `on/off policy` #164

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Dịch on/off policy #164

cuongvng Jul 1, 2022 Collaborator

Replies: 1 comment · 3 replies

minhduc0711 Jul 3, 2022 Collaborator

cuongvng Jul 3, 2022 Collaborator Author

minhduc0711 Jul 5, 2022 Collaborator

cuongvng Jul 7, 2022 Collaborator Author

Dịch `on/off policy` #164

cuongvng
Jul 1, 2022
Collaborator

Replies: 1 comment 3 replies

minhduc0711
Jul 3, 2022
Collaborator

cuongvng Jul 3, 2022
Collaborator Author

minhduc0711 Jul 5, 2022
Collaborator

cuongvng Jul 7, 2022
Collaborator Author