Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hey there,
I ran some experiments with RL and wondered why the reward function does not directly depend on the lap time. I think it makes a lot of sense to shape the reward to encourage staying in the center and driving with high velocity. However, to get a racing policy that sometimes intentionally breaks these rules (e.g. doesn't stay in the center to take a curve optimally) to really optimize the lap time, I think the reward function should include the lap time in the reward calculation (give a bonus for low lap times).
I've added an idea on how to implement this in the attached commit. Here, an additional reward is given that is inversely correlated to the lap time (high lap time = low reward, low lap time = high reward). It can be scaled with a factor that weights this additional reward versus the other rewards. This factor is currently just eyeballed and probably needs to be tuned for optimal results. But even with the current form, it yielded pretty good experimental results.
Link to a video with a trained PPO (the simulator is set to 4x speed and max. throttle=0.4; PPO is trained without action smoothing and without frame stacking - so just a really simple baseline): https://drive.google.com/file/d/1Ucsrfwqm02PzzJlb76ozMVTX_tqMiIjE/view?usp=sharing
Eager to hear what you think.
Best, Till