Why DQN cannot learn to switch bitrate? #1

howfars · 2020-10-19T02:02:12Z

Hi,godka
Thanks for sharing your DQN-ABR project.I have some question to ask you.
I'm a newbie in RL and ABR field.I tried to train and test the DQN-based ABR algorithm using this project.I test the trained model using the pensieve/test code.But I found it performs worse than the simple buffer-based algorithm.
Here are some results figures.
The first is the episode average reward curve.Finally the episode average reward is negative and can't achieve 0.

The second is the total reward testing in the test dataset,concluding the dqn algorithm and bb algorithm.dqn is worse than bb.

The third is the picture which shows how dqn algorithm choose bitrate in each test trace.I found that DQN cannot learn to switch the bitrate,but always prefer to choose a certain bitrate .It makes the QoE bad.

I also try to compare it with the A3C algorithm from pensive,I found that the A3C-based algorithm performs better than DQN-based algorithm,it can learn to switch the bitrate according to buffer occupancy and bandwidth.The QoE metric is also better than bb algorithm.The picture below shows how a3c choose the bitrate.

So I wonder why DQN cannot learn to switch the bitrate but prefer to choose one certain bitrate?Is DQN not suitable as a abr algorithm?

The text was updated successfully, but these errors were encountered:

godka · 2020-10-19T07:17:22Z

Hi,

First things first, welcome to make contributions to the ABR field!

Regardless of whether DQN is suitable for ABR or not, IMHO, the short answer is: DQN can also handle the ABR task.

This repo. was done about 2 years ago, which is currently deprecated.
The stable version of DQN meets the ABR task is here: https://github.com/godka/Pensieve-PPO/tree/dqn/src. We have implemented several state-of-the-art RL algorithms, such as A2C, PPO, as well as off-policy RL algorithms (e.g., Double-DQN).

For more details about DQN please refer to this page. Note that we employ Double-DQN rather than DQN.

https://github.com/godka/Pensieve-PPO/blob/b4d28bae9bc34e27905b23e88da5b22b7203ce9a/src/dqn.py#L16

In terms of the results, we have to claim that, DQN-ABR rivals A2C-ABR ( like Pensieve), while it underperforms PPO-ABR.
The training curve is shown as follows，in which red: dual-PPO, blue: DQN, orange: Double-DQN.

Please feel free to let me know if you have any questions.

Best,
Tianchi.

howfars · 2020-10-19T12:05:29Z

Thank you for your reply,it's very helpful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why DQN cannot learn to switch bitrate? #1

Why DQN cannot learn to switch bitrate? #1

howfars commented Oct 19, 2020

godka commented Oct 19, 2020 •

edited

Loading

howfars commented Oct 19, 2020

Why DQN cannot learn to switch bitrate? #1

Why DQN cannot learn to switch bitrate? #1

Comments

howfars commented Oct 19, 2020

godka commented Oct 19, 2020 • edited Loading

howfars commented Oct 19, 2020

godka commented Oct 19, 2020 •

edited

Loading