You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for your wonderful sharing. While from your code, in all your learning based algorithms, the total reward calculation is based on the instance_done, which means your reward is only the reward of that specific instance, rather than all the VNR requests.
Your reward in learn_singly(...) is always 0, I suppose the long term reward should be accumulated here which is the true long term reward, while you always keep it to 0.
Please feel free to correct me if I am wrong, thanks.
The text was updated successfully, but these errors were encountered:
Thank you for your feedback. In our library, we support two types of learning paradigms:
Instance-level optimization, which focuses on finding the optimal solution for each individual instance as it arrives.
Online-level optimization, which aims to learn a globally optimal policy to maximize overall system performance metrics across all instances.
You can find the implementations of both paradigms, including the instance-level and online-level environments, in the rl_solver directory.
As you correctly noted, most of the current implementations in our library are based on the instance-level paradigm. This is due to our empirical analysis and insights. In network systems, the randomness of service requests makes it difficult to learn a robust online-level policy that meets expectations. This approach tends to require more time and often does not deliver satisfactory performance. Conversely, the instance-level paradigm allows us to efficient obtain a high-quality solving policy, leading to more reliable and efficient results in practice.
Hi, thanks for your wonderful sharing. While from your code, in all your learning based algorithms, the total reward calculation is based on the instance_done, which means your reward is only the reward of that specific instance, rather than all the VNR requests.
Your reward in learn_singly(...) is always 0, I suppose the long term reward should be accumulated here which is the true long term reward, while you always keep it to 0.
Please feel free to correct me if I am wrong, thanks.
The text was updated successfully, but these errors were encountered: