Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

np.array on Configuration returns names of HPs instead of values #363

Open
benjamc opened this issue Jun 3, 2024 · 2 comments
Open

np.array on Configuration returns names of HPs instead of values #363

benjamc opened this issue Jun 3, 2024 · 2 comments

Comments

@benjamc
Copy link

benjamc commented Jun 3, 2024

When calling np.array on Configuration it returns names of HPs instead of values. Is that intended?
ConfigSpace 0.7.1

MWE:

import numpy as np
from ConfigSpace import ConfigurationSpace, Float

cs = ConfigurationSpace()
for i in range(3):
    cs.add_hyperparameter(Float(f"x_{i}", (0, 1)))

config = cs.sample_configuration()
print("output:", np.array(config))
print("expected:", config.get_array())

Output:

output: ['x_0' 'x_1' 'x_2']
expected: [0.09972031 0.91072326 0.16544557]
@eddiebergman
Copy link
Contributor

eddiebergman commented Jun 3, 2024

Short answer, yes, intended to give you the keys, use np.array(config.values()) is you want the unnormalized values in a numpy array.

Sorry for typos, phone typing...


np.array specification relies on the ad-hoc __array__ protocol, implemented on things like pandas dataframes, torch tensors and others, to efficiently do array stuff.

However things like a python list don't have this, or others things like a python list. I'm sure calling np.array([1,2,3]) does something smart to pull out the values 1,2,3 but users can also implement their own list-like (Sequence/MutableSequence), in which the only thing you can do is iterate it. Might look something like this:

def array(x: Any) -> np.ndarray:
    if hasattr(x, "__array__"):
        # follow protocol
    elif isinstance(x, (list, tuple, builtin-python-thing)):
        # do some low level Cpython manipulation
    elif isinstance(x, Sequence):
        # user implement list like, can't really do better than this
        x_data = x[:len(x)]
        return array(x_data)
    elif isinstance(x, Iterable):
        x_data = [e for e in x]
        return array(x_data)
    else:
        # ....

Now the main point, Configuration is a Mapping (dict-like) and so in this setup, it would match the Iterable statement. Basically np.array can't do anything smart with a Mapping and so it defaults to using __iter__ on it. Basically the behaviour matches that of calling list() on a dict, which iterates throughs the keys

I would argue the main use case of a Configuration is that it behaves more like a dict than a vector, and so making it act like a Sequence doesn't make sense. Further, putting the unnormalized values into a numpy array can contain strings, floats, ints, and soon arbitrary values, i.e. doesn't make much sense for an array. Could argue about putting the normalized values in there but then that's really far from the common use case of a Configuration.


Had some time and did this on my phone but could you check some stuff for me?

pd.Series acts like a dict (kinda), i.e. heterogenous key-value pairs... But it's also a library that implements the __array__ protocol. What happens when you do np.array(pd.Series({"a":1, "b":2}))?

If it gives you an array of [1, 2], I could be persuaded to look into the array protocol so what you posted works, otherwise if it's gives ["a", "b"] or an error, I would stick to keeping the behaviour as it would normally be for a Mapping, even in the case of there being some vectorized format available.

@benjamc
Copy link
Author

benjamc commented Jun 3, 2024

Thank you for the explanation, makes sense! Feel free to close the issue.

Running np.array(pd.Series({"a":1, "b":2})) yields array([1, 2]).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants