Skip to content

Simplify config validation #232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 tasks
jlamypoirier opened this issue Apr 15, 2025 · 1 comment
Open
4 tasks

Simplify config validation #232

jlamypoirier opened this issue Apr 15, 2025 · 1 comment
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@jlamypoirier
Copy link
Collaborator

🎯 Goal (What & Why)

See discussion in #211. The config validation scheme currently makes little distinction between validation, mutation, and derivation, which can make things difficult to follow. We need solutions to improve this.

Some possible avenues:

  • Replace derived fields with cached properties whenever possible so they are separated from the rest.
  • Add methods like _set_implicit_defaults, _set_children_fields, _calculate_derived_fields, _validate_fields, etc. in side validate. This woudn't prevent bad usage but would at least be explicit about what is expected.
  • Document best practices.

🚀 Execution Plan

(This section may start as an incomplete draft but must be defined before implementation begins.)

Step 1: What is the smallest working version?

(Describe the simplest way to implement this feature with minimal effort.)

Step 2: What additional optimizations are possible (but optional)?

(List potential refinements that can be added in later PRs if needed.)

📌 Acceptance Criteria (Must-Haves for Completion)

  • The feature must be functional and tested.
  • The implementation must be documented in practical terms.
  • The PR must include a performance/impact summary.
  • No refactors unless directly necessary for feature completion.

🛠️ Project Management

  • Assign the project to the Fast-LLM project.
  • Set the Estimate field (in days) in the GitHub project.
  • Use the Size field to categorize the PR size (Small/Medium/Large).
  • Assign an owner when opening the issue.
@jlamypoirier jlamypoirier added enhancement New feature or request help wanted Extra attention is needed labels Apr 15, 2025
@jlamypoirier jlamypoirier reopened this Apr 24, 2025
@jlamypoirier
Copy link
Collaborator Author

jlamypoirier commented Apr 24, 2025

Following the discussion in #211 and a few other things (personal experience, working on config doc, etc), I believe the best course of action would be to separate validation in two:

  • resolve: Things with mutations: set up implicit defaults, standardize data types, etc.
  • validate (or rename to avoid confusion): Checks only, config is frozen.

This helps separate the concerns, but highlights a secondary issue (which is already there): When resolution of a field depends on another one, we need to ensure we're using the final value of that field. Failure to do so can result in unexpected crashes, cryptic error messages, or downstream bugs. Some possible solutions:

  • Leave the issue to the user, ex. access values after the super().resolve() call, and/or check again in validate. This works but leaves the developer responsible for it, so adds cognitive workload and risk of bugs.
  • Add an option to resolve and/or validate a single field (and freeze it afterwards), to ensure we have the final value. This is slightly better but still have the same downsides.
  • Same above, but use a custom __getattr__ to automatically resolve/validate a field on access (similar to what we do with __setattr__. This is somewhat magical but simpler for the developer. I tend to prefer this option.

@tscholak @bigximik do you have an opinion on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant