Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ifbo #115

Open
wants to merge 31 commits into
base: master
Choose a base branch
from
Open

Add ifbo #115

wants to merge 31 commits into from

Conversation

Neeratyoy
Copy link
Contributor

@Neeratyoy Neeratyoy commented Jul 15, 2024

Adding ifBO and refactoring a bunch of freeze-thaw based multi-fidelity algorithms.
Working example here.

Existing issues or concerns:

  • Adding parallelization support
    • Need resolving some issues
  • Tblogger from the example does not seem to work
    • Only config_0_0 has the tbevent files
    • (did not reproduce) On restart, there was an error from the tbevent file-not-there
  • Resolving Python dependency removing python 3.8 and 3.9 support #140

@Neeratyoy
Copy link
Contributor Author

@eddiebergman @TarekAbouChakra is the tblogging supported currently?
If you can install this branch and run this example and view Tensorboard, it is no longer working.

@eddiebergman
Copy link
Contributor

eddiebergman commented Aug 30, 2024

Just fyi, [PR](#140) didn't work as intended, you can just do PR #140 :)

@eddiebergman
Copy link
Contributor

As for the PR about python dependancies, just get ifbo in first and we can do the python dep thing after

@eddiebergman
Copy link
Contributor

Regarding tblogger, I can't initially see what the issue is. Seems that the tblogger at each iteration will get the trial that is currently being evaluated from the runtime.py. The runtime.py will ask the optimizer for the next config to evaluate (which returns the config dict as well as the ID), sets that as the currently running trial and then sends it to run_pipeline(). Inside run_pipeline() is where the tblogger.log() is called, at which point it asks the runtime.py, "hey, what config is currently being evaluated?". Should be the exact same as what's used for determining the config directory.

@eddiebergman
Copy link
Contributor

Think I found the issue:

This line:

if not tblogger._is_initialized():
tblogger._initialize_writers()

Basically it only initializes once, where intialization is in charge of deciding where to write to:

def _initialize_writers() -> None:
# This code runs only once per config, to assign that config a config_writer.
if (
tblogger.config_previous_directory is None
and tblogger.config_working_directory
):
# If no fidelities are there yet, define the writer via the config_id
tblogger.config_id = str(tblogger.config_working_directory).rsplit(
"/", maxsplit=1
)[-1]
tblogger.config_writer = SummaryWriter_(
tblogger.config_working_directory / "tbevents"
)
return
# Searching for the initial directory where tensorboard events are stored.
if tblogger.config_working_directory:
init_dir = get_initial_directory(
pipeline_directory=tblogger.config_working_directory
)
tblogger.config_id = str(init_dir).rsplit("/", maxsplit=1)[-1]
if (init_dir / "tbevents").exists():
tblogger.config_writer = SummaryWriter_(init_dir / "tbevents")
return
raise FileNotFoundError(
"'tbevents' was not found in the initial directory of the configuration."
)

@Neeratyoy
Copy link
Contributor Author

@eddiebergman how should we go about this PR?

Parallelization works and I happy for now with local testing.

Tblogger is the biggest bummer here, along with test cases failing.

@@ -27,7 +27,7 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: 3.8
python-version: 3.10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
python-version: 3.10
python-version: '3.10'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants