Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ForbiddenCallableRelation #280

Draft
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

nchristensen
Copy link

@nchristensen nchristensen commented Dec 5, 2022

Allows for more complicated forbidden relationships using Callables. Related issues #254 #272 #277.

Also adds support for sigma=0 in NormalFloatHyperparameter and NormalIntegerHyperparameter and non-integer mu (mean) values in NormalIntegerHyperparameter as well as fixes some json serialization bugs.

@eddiebergman
Copy link
Contributor

Hi @nchristensen,

While this is quite cool, and definitely something that's desirable, it makes somethings impossible which is something we definitely want to keep, namely serializibility. We can't save configspaces with callables to JSON.

Another example, close to ConfigSpace is ParameterSpace which does allow for the callables but lacks serializibility for general functions. They do serialize lambda functions but I can't speak to it's robustness.

In light of this, I don't want to outright reject the PR, it's still nice to have but I imagine it will require some explicit testing with respect to serialization and we need explicit documentation that use of this feature would prevent non-binary serialization to disk. (You could still pickle them of course but pickles are a bit volatile with long term storage).

I'll review this later today and drop some pointers as to where this could be done!

@codecov
Copy link

codecov bot commented Dec 15, 2022

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (88051eb) 67.97% compared to head (f4e8758) 68.05%.
Report is 27 commits behind head on main.

❗ Current head f4e8758 differs from pull request most recent head 815b18b. Consider uploading reports for the commit 815b18b to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #280      +/-   ##
==========================================
+ Coverage   67.97%   68.05%   +0.08%     
==========================================
  Files          25       25              
  Lines        1786     1800      +14     
==========================================
+ Hits         1214     1225      +11     
- Misses        572      575       +3     
Files Coverage Δ
ConfigSpace/__init__.py 100.00% <100.00%> (ø)
ConfigSpace/__version__.py 100.00% <100.00%> (ø)
ConfigSpace/functional.py 100.00% <100.00%> (ø)
ConfigSpace/read_and_write/json.py 79.92% <71.42%> (-0.08%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@eddiebergman eddiebergman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So functionally, it all seems there but as mentioned in the PR comment, we definitely need tests to check its behaviour when conisdering serialization.

My guess is serialization will fail and give some JSON based error. If this is the case. To counteract this, I think we should raise an explicit error stating that using a ForbiddenCallable means you can not serialize the space to JSON. This should also be stated in the documentation directly.

Thanks for the PR though, sorry for my critical comments. We do appreciate it, really and I'm sorry we don't dedicate more time to directly improving this library and user contributions are a nice surprise!

ConfigSpace/forbidden.pyx Outdated Show resolved Hide resolved
Comment on lines 597 to 603
left : :ref:`Hyperparameters`
first argument of callable

right : :ref:`Hyperparameters`
second argument of callable

f : A callable that relates the two hyperparameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sphinx will complain about the documentation having a leading space on this line unfortunately, it's very particular :/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My knowledge of Sphinx is limited, but I revised it modeled on the ForbiddenEqualsRelation. Hopefully that resolves this.

def __eq__(self, other: Any) -> bool:
if not isinstance(other, self.__class__):
return False
return super().__eq__(other) and self.f == other.f
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice use of the super class to keep consistency in equality checking.

def __repr__(self):
from inspect import getsource
f_source = getsource(self.f)
return f"Forbidden: {f_source} | Arguments: {self.left.name}, {self.right.name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This getsource call could be quite large and rather verbose. For example:

def f():
    print("hello")
    x = 1 + 2
    y = "hi" + "mars"
    return b"no"
    
from inspect import getsource
getsource(f)
# 'def f():\n    print("hello")\n    x = 1 + 2\n    y = "hi" + "mars"\n    return b"no"\n'

I think we could just get around this by using the functions, f.__qualname__ instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow up you could accomodate lambdas as:

qualname = f.__qualname__
f_repr = getsource(f) if qualname == "<lambda>" else qualname

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I was noticing formatting issues using getsource as well since it keeps all of the leading white space.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obtaining only the source code of the lambda (and discarding any other code on the same line) is apparently non-trivial. See for example https://stackoverflow.com/questions/59498679/how-can-i-get-exactly-the-code-of-a-lambda-function-in-python. I have revised it to use the qualname though.

f_source = getsource(self.f)
return f"Forbidden: {f_source} | Arguments: {self.left.name}, {self.right.name}"

cdef int _is_forbidden(self, left, right) except -1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this except -1 do? I wish I knew more Cython but unfortunatly not. I looked at the class above and it didn't seem to have this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modeled this off of the ForbiddenLessThanRelation directly below which uses this. I'm not very familiar with Cython either, but it seems any cdef function that might return a Python exception needs to be declared with an except value. https://docs.cython.org/en/latest/src/userguide/language_basics.html#error-return-values

Comment on lines +1053 to +1073
sigma = self.sigma
if sigma == 0:
return self.mu
elif self.lower == None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice change in of itself. Depending on how this PR goes, I still think we'd like to pull this in

Comment on lines 1717 to 1718
if self.sigma == 0:
assert isinstance(self.mu, int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this assertion? I feel like they should both just be int. The init signature seems off here to suggest otherwise.

Copy link
Author

@nchristensen nchristensen Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (or some form of rounding) is needed if a non-integer mean is allowed. But I'm also fine with restoring the existing behavior if allowing a non-integer mean is undesirable.

@@ -1663,8 +1668,7 @@ cdef class NormalIntegerHyperparameter(IntegerHyperparameter):
cdef public nfhp
cdef normalization_constant


def __init__(self, name: str, mu: int, sigma: Union[int, float],
def __init__(self, name: str, mu: Union[int, float], sigma: Union[int, float],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems you explicitly type allow float? How come?

Copy link
Author

@nchristensen nchristensen Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that even if we're drawing integers from a normal distribution, there is no requirement that the mean of the distribution be an integer. For instance, a user might want values <= 4 and values >= 5 to be chosen with equal probability and so place the mean at 4.5.

@nchristensen
Copy link
Author

Yeah, serializability could be difficult with this. I think it would be useful to optionally allow pickling of the Callable during serialization, but to disallow this by default.

@mfeurer
Copy link
Contributor

mfeurer commented Jan 5, 2023

Hey @nchristensen thank you very much for the PR. I'll discuss with @eddiebergman whether we will extend the ConfigSpace package to allow callables (no timeline, though). However, your changes to the hyperparameter file appear to be useful by themselves (as mentioned by @eddiebergman). Would you like to create a single PR so we can merge these anyway?

@nchristensen
Copy link
Author

nchristensen commented Jan 5, 2023

Sure, it will probably be sometime in February before I can get back to this though. Turned out there wasn't anything to disentangle so I went ahead and did it.

nchristensen and others added 21 commits January 9, 2023 13:28
… and round the result to the nearest integer
…ization bug fixes, another fix for float mu in NormalInteger space
* test: Add reproducing test

* fix: Make sampling neighbors form uniform Int stable

* fix: Memory leak with UniformIntegerHyperparameter

When querying a large range for a UniformIntegerHyperparameter with a
small std.deviation and log scale, this could cause an infinite loop as
the reachable neighbors would be quickly exhausted, yet rejection
sampling will continue sampling until some arbitrary termination
criterion. Why this was causing a memory leak, I'm not entirely sure.

The solution now is that is we have seen a sampled value before, we
simply take the one "next to it".

* fix: Memory issues with Normal and Beta dists

Replaced usages of arange with a chunked version to prevent memory
blowup. However this is still incredibly slow and needs a more refined
solution as a huge amount of values are required to be computed for what
can possibly be analytically derived.

* chore: Update flake8

* fix: flake8 version compatible with Python 3.7

* fix: Name generators properly

* fix: Test numbers

* doc: typo fixes

* perf: Generate all possible neighbors at once

* test: Add test for center_range and arange_chunked

* perf: Call transform on np vector from rvs

* perf: Use numpy `.astype(int)` instead of `int`

* doc: Document how to get flamegraphs for optimizing

* fix: Allow for negatives in arange_chunked again

* fix: Change build back to raw Extensions

* build: Properly set compiler_directives

* ci: Update makefile with helpful commands

* ci: Fix docs to install build

* perf: cython optimizations

* perf: Fix possible memory leak with UniformIntegerHyperparam

* fix: Duplicates as `list` instead of set

* fix: Convert to `long long` vector

* perf: Revert clip to truncnorm

This truncnorm has some slight overhead due to however
scipy generates its truncnorm distribution, however this
overhead is considered worth it for the sake of readability
and understanding

* test: Test values not match implementation

* Intermediate commit

* INtermediate commit 2

* Update neighborhood generation for UniformIntegerHyperparameter

* Update tests

* Make the benchmark sampling script more robust

* Revert small change in util function

* Improve readability

Co-authored-by: Matthias Feurer <[email protected]>
eddiebergman and others added 6 commits January 31, 2023 12:45
The builds wheels for Python 3.11, as well as disable wheel builds for win32 and i686 architectures due to scipy not distributing wheels for these in their latest versions. 

* feat: python 3.11 wheels

* ci: trigger workflow

* ci: update cibuildwheel for python 3.11

* ci: update other cibuildwheels

* ci: disable >=3.8 win32 wheels

* ci: Remove debug trigger
)

* Enable replacing InCondition and ForbiddenRelation constraints

* Allow nan values in CategoricalHP _pdf function
@nchristensen nchristensen marked this pull request as draft September 12, 2023 02:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants