RNaD off policy case #1109

spktrm · 2023-08-22T02:00:49Z

In the example for RNaD, the importance sampling correction for get_loss_nerd is 1. This is because the example provided is the on-policy case, and there are synchronous updates of the policy between acting and learning.

My question is what needs to be changed for this example to be used in an asynchronous off-policy setting? Is it as simple as substituting the importance sampling correction for a policy ratio term? What would this look like exactly?

How could I construct the importance sampling correction for the off-policy case?

spktrm · 2023-09-26T08:57:09Z

@perolat any ideas?

spktrm · 2024-01-29T11:48:43Z

@lanctot is there a better channel to get in contact with @perolat - I feel as though he may have missed my email.

lanctot · 2024-01-29T11:59:28Z

I just chatted with him and will send him the currently open questions later today. Is this currently the only unresolved one?

spktrm · 2024-01-30T07:05:05Z

Hi,

Both this issue and this one: #1075

Keen to hear back :)

spktrm mentioned this issue Dec 13, 2023

RNaD: Possible Error in calculation of Neurd Loss #1156

Closed

lanctot assigned perolat Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RNaD off policy case #1109

RNaD off policy case #1109

spktrm commented Aug 22, 2023

spktrm commented Sep 26, 2023

spktrm commented Jan 29, 2024

lanctot commented Jan 29, 2024

spktrm commented Jan 30, 2024

RNaD off policy case #1109

RNaD off policy case #1109

Comments

spktrm commented Aug 22, 2023

spktrm commented Sep 26, 2023

spktrm commented Jan 29, 2024

lanctot commented Jan 29, 2024

spktrm commented Jan 30, 2024