Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch sampling improvement #1154

Draft
wants to merge 7 commits into
base: development
Choose a base branch
from
Draft

Conversation

dengdifan
Copy link
Contributor

Closing #1152
This is the first step towards solving the batch sampling

TODO:

  • checking if everything works well under batch setting
  • adding unit tests

@benjamc benjamc mentioned this pull request Oct 24, 2024
@benjamc benjamc added this to the v2.3 milestone Nov 27, 2024
@dengdifan dengdifan requested a review from benjamc December 2, 2024 14:50
Copy link
Collaborator

@benjamc benjamc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Update CHANGELOG.md
  • Possibly update example
  • Update docs / somewhere mention this new feature

Y_estimated = self.estimate_running_config_costs(
X_running, Y, self._batch_sampling_estimation_strategy
)
if Y_estimated is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in what cases could this be None?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah when there are no running configs

a np array with size (n_evaluated_configs, n_obj) that records the costs of all the previous evaluated
configurations
estimation_strategy: str
how do we estimate the target y_running values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • copy docstring from above to add more info about the estimation strategy here

Y_evaluated: np.ndarray,
estimation_strategy: str = 'CL_max'):
"""
This function is implemented to estimate the still pending/ running configurations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • add newline

Y_estimated = np.nanmin(Y_evaluated, axis=0, keepdims=True)
return np.repeat(Y_estimated, n_running_points, 0)
elif estimation_strategy == 'CL_mean':
# constant liar min, we take the mean values of all the evaluated Y and apply them to the running X
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be constant liar mean instead of min

# gaussian process
assert isinstance(self._model, GaussianProcess), 'Sample based estimate strategy only allows ' \
'GP as surrogate model!'
return self._model.sample_functions(X_test=X_running, n_funcs=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cannot we sample from the random forest?

trial: self.runhistory[trial]
for trial in self.runhistory
if self.runhistory[trial].status == StatusType.RUNNING
# and runhistory.data[run].time >= self._algorithm_walltime_limit # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this commented out / why would we need this?
If it should stay there commented, please explain why

trial: self.runhistory[trial]
for trial in self.runhistory
if self.runhistory[trial].status == StatusType.RUNNING
# and runhistory.data[run].time >= self._algorithm_walltime_limit # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -211,6 +234,13 @@ def _get_timeout_trials(

return trials

def _convert_config_ids_to_array(self,
config_ids: Iterable[int]) -> np.ndarray:
"""extract the configurations from rh and transform them into np array"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • write proper docstring with Parameters and return values

@benjamc benjamc linked an issue Dec 4, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Review
Development

Successfully merging this pull request may close these issues.

Improve batch sampling
2 participants