Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a "distributed" option to the LogReport class. #676

Open
linshokaku opened this issue May 1, 2023 · 0 comments
Open

Adding a "distributed" option to the LogReport class. #676

linshokaku opened this issue May 1, 2023 · 0 comments

Comments

@linshokaku
Copy link
Member

During distributed learning, the LogReport class cannot aggregate reports from all processes. This should be fixed because it can cause bugs when using extensions or options that depend on the values in the logs. For example, there are issues when using EarlyStoppingTrigger or ReduceLROnPlateau.

To address this problem, we need to modify the LogReport class so that it gathers the summary objects from all processes at the trigger point and recalculates the averages. We should also add an "writer_rank" option to ensure that only one process creates the log file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant