feat: enhance parser domain-agnostic support #117
+418
−167
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces improvements and refactoring across multiple modules. The key changes include making the URL parser domain-agnostic, refactoring HTTP response handling, renaming functions for clarity, converting core functions to asynchronous operations, and standardizing terminology and documentation.
Highlights:
Domain-Agnostic Parsing:
query_parser.py
to support multiple Git hosts by maintaining a list of known domains.try_domains_for_user_and_repo
to iteratively guess the correct domain for a given user/repo._get_user_and_repo_from_path
,_validate_host
,_validate_scheme
) to facilitate robust parsing._parse_repo_source
to leverage the new domain-agnostic logic.test_query_parser.py
and a new test filetest_git_host_agnostic.py
to verify these changes.Enhanced Repository Existence Check:
_get_status_code
inrepository_clone.py
to extract HTTP response codes cleanly._check_repo_exists
to utilize_get_status_code
, refining its logic:True
for status codes 200 and 301.False
for status codes 302 and 404.test_repository_clone.py
to cover redirect scenarios and ensure correctness.Function Renaming and Documentation:
_parse_url
to_parse_repo_source
inquery_parser.py
for clarity.Asynchronous Conversions:
parse_query
inquery_processor.py
,main
incli.py
, andingest
inrepository_ingest.py
) to asynchronous to support domain-agnostic parsing.test_query_parser.py
to support async execution.Terminology and Documentation Standardization:
README.md
, fixed trailing slashes in links, and ensured punctuation consistency.github.jinja
→git.jinja
,github_form.jinja
→git_form.jinja
) and variables (github_url
→repo_url
) accordingly.Test Organization:
test_query_parser.py
to a more structured location undertests/query_parser/
for better organization.These changes collectively improve the flexibility of the parser for multiple Git hosts and enhance code clarity and consistency.