Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: Remove Cython - Reduce tech debt #346

Open
wants to merge 98 commits into
base: main
Choose a base branch
from

Conversation

eddiebergman
Copy link
Contributor

@eddiebergman eddiebergman commented Dec 18, 2023

This PR attempts to first convert everything to Python, so a type checker can be run across everything as well as use linters properly. This also includes updating everything to pytest.

Note: This PR is unreviewable. The strategy has been to minimally change tests while refactoring to ensure tested behaviour still passes. Some notable exceptions are where functionality was removed or sampling strategy was changed and the expected values differ. Unfortunatly the formatter has rendered them also unreviewable.

Update: All SMAC unittests passed without code changes so it seems mostly backwards compatible.

Some notable changes:

  • No more unbounded Normals
  • No more quantized hyperparameters
  • Sampling is 3x faster, Neighbors 3.5x faster and validation 2x faster (see scripts/benchmark_sampling.py)
  • More code re-use between classes
  • Pure Python, jumping to definition and docstrings should work in editors
  • Defined data-structures to capture heirarchy in conditional spaces, moving the logic out of ConfigurationSpace.
  • Better defined public API for ConfigurationSpace.
  • Better support for custom hyperparameters, i.e. backed by Scipy Distributions (see BetaFloat/Int). Should require a lot less code to do so.
  • Better defined public API for hyperparameters
  • A whole load of deprecations with recommended alternatives
  • Categoricals/Ordinals support arbitrary objects, e.g. no more need to specify Category([True, False, "None"]) (Now handled in follow up feat(Hyperparameters): Allow arbitrary objects in category ordinal #359)

TODO:


@eddiebergman eddiebergman changed the title Refactor: Update types for Cython Refactor: Update types for Cython [In Progress] Dec 18, 2023
Copy link

codecov bot commented Dec 19, 2023

Codecov Report

Attention: Patch coverage is 92.06843% with 51 lines in your changes missing coverage. Please review.

Project coverage is 78.91%. Comparing base (93abc5b) to head (2a29120).
Report is 2 commits behind head on main.

Current head 2a29120 differs from pull request most recent head 09beefd

Please upload reports for the commit 09beefd to get more accurate results.

Files Patch % Lines
ConfigSpace/forbidden.py 82.35% 17 Missing and 1 partial ⚠️
ConfigSpace/conditions.py 93.04% 7 Missing and 1 partial ⚠️
ConfigSpace/configuration_space.py 64.28% 3 Missing and 2 partials ⚠️
ConfigSpace/hyperparameters/normal_float.py 88.09% 4 Missing and 1 partial ⚠️
ConfigSpace/hyperparameters/normal_integer.py 85.71% 4 Missing and 1 partial ⚠️
ConfigSpace/hyperparameters/categorical.py 90.90% 3 Missing ⚠️
ConfigSpace/hyperparameters/ordinal.py 91.42% 3 Missing ⚠️
ConfigSpace/hyperparameters/uniform_integer.py 96.29% 2 Missing ⚠️
ConfigSpace/hyperparameters/numerical.py 97.05% 1 Missing ⚠️
ConfigSpace/hyperparameters/uniform_float.py 96.96% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #346      +/-   ##
==========================================
+ Coverage   73.54%   78.91%   +5.37%     
==========================================
  Files          29       45      +16     
  Lines        2831     4767    +1936     
  Branches      629     1005     +376     
==========================================
+ Hits         2082     3762    +1680     
- Misses        632      806     +174     
- Partials      117      199      +82     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eddiebergman eddiebergman changed the title Refactor: Update types for Cython [In Progress] Refactor: Remove Cython - Reduce tech debt Mar 28, 2024
@sonovice
Copy link

For everyone else being stuck due to type errors etc:
You can pull in the current state of this PR to your poetry env with

poetry add git+https://github.com/automl/ConfigSpace.git@refs/pull/346/merge

Might help in your case.

@@ -0,0 +1,337 @@
from __future__ import annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the classes in this file require a docstring. Without a docstring it is completely unclear what they do, and why this abstraction is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for the abstraction of things in this file is that a lot of methods for the hyperparameters were essentially doing the same things, i.e. sampling from some vectorized distribution and then transforming to and from.

The original thought was that this would be one of the cythonized files, i.e. allow the hyperparameter files to be seen in editors and see their definitions and workings, however this also ended up with lifting all behavior to the base Hyperparameter class. This helps for a few reasons, namely:

  1. A single source for behaviour of hyperparameters, reducing duplication/inconsistencies.
  2. Implementing hyperparameters is mostly just passing the different components described above to the base Hyperparameter class at this point. You can see this from the individual hyperparameter classes that exist.
  3. Any optimizations done for the underlying shared components can apply to all hyperparmaters, for example neighbor generation or sampling.

It's not perfect of course but really it's trying to make hyperparameters rely more on compositional structure rather than hierarchical behaviors.

src/ConfigSpace/hyperparameters/_distributions.py Outdated Show resolved Hide resolved
src/ConfigSpace/hyperparameters/ordinal.py Show resolved Hide resolved
src/ConfigSpace/_condition_tree.py Show resolved Hide resolved
test/test_util.py Outdated Show resolved Hide resolved
test/test_util.py Outdated Show resolved Hide resolved
@eddiebergman
Copy link
Contributor Author

@mfeurer I converted all the documentation pages to use mkdocs-material format, i.e. markdown documentation.
The primary reason is I know how to build docs/version and deploy them. It's also a much less finicky syntax, i.e. I never get a sphinx link correct.

I've documented most of the major modules at this point but have yet to set up the actual doc generation process. What I think would be the most useful for you to review at this point would be the pages under docs/reference.

I mostly left the documentation the same but I've added a clearer introduction to hyperparameters

Due to a no-longer-updated take on PCS format, I've now focused serialization on json() and yaml() which made this page change: https://github.com/automl/ConfigSpace/blob/typing-cython-3/docs/reference/serialization.md?plain=1
The encoder/decoder part is to satisfy this issue #357. It also enables us to allow people to create their own import and export formats as required for them.

The rest of the docs are mostly the same.


I still have left to do API documentation pass over ConfigurationSpace and Configuration as well as the utils.py file.


After that I need to setup mkdocs and fix any potential typoed links and such. The look and layout will be mostly similar to amltk docs.

Sorry for migration away from sphinx, but I really just don't like writing in it and it's hard to explain the nuances to anyone new.

@sarah-segel
Copy link

FYI: I tested running DeepCave with ConfigSpace 0.8.0 and ran into the following problems when executing :

  1. I run into an error when calling
    from ConfigSpace.c_util import change_hp_value, check_forbidden
  2. It seems like the default values in my configspace.json cause a problem when calling
    from ConfigSpace.read_and_write import json as cs_json
    cs_json.read(self.configspace_fn.read_text())
    where I get:
    __init__() got an unexpected keyword argument 'default'
  3. When calling hp._transform_scalar(value) I get
    AttributeError: 'UniformFloatHyperparameter' object has no attribute '_transform_scalar'

@eddiebergman
Copy link
Contributor Author

eddiebergman commented Jun 27, 2024

@mfeurer, I did a major push on getting the documentation updated to mkdocs. You can view it locally by cloning this branch, pip install -e "[dev]" and running mkdocs serve. Likely another push should get this to the finish line.

@sarah-segel thanks for the report back, I will look into them with the next big push. Do you have any more information on the first one other than "ran into an error"? Second one is likely a bug on my behalf, will investigate. Last one is good to know about. It was a private method so I considered safe enough to remove. I will add it back in with the deprecation warning.

@eddiebergman
Copy link
Contributor Author

@sarah-segel should have been fixed now, can you try again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment