Skip to content

Conversation

MarceloRobert
Copy link
Collaborator

@MarceloRobert MarceloRobert commented Oct 15, 2025

Changes

Refactors ingester code:

  • Moves constants to a specific file (also makes constants able to be changed from env vars)
  • Moves some functions to more specific files
  • Adds more typing indicators
  • Lowers the complexity of the most complex functions

How to test

The behavior should be the same as before. So:

  • Start the ingester
    • Create a temporary folder for the submissions as backend/kernelCI_app/management/commands/tmp_submissions (you can change the path, just update the command below)
    • Make sure you have your trees-name.yml file somewhere. It usually is on /backend/volume_data as a result of the treeproof command. If you are feeling lazy and you don't want to run the command, there's an example file here (Google Drive link)
    • Make sure you have a couple of example submission files for testing. You can grab some in this zip file here (Google Drive link)
    • Export the environment variables in your terminal, pointing one of the databases to the local one. Make sure the local one is set as the default one with USE_DASHBOARD_DB=True
    • Now you'll be able to run the command with poetry run python3 ./manage.py monitor_submissions --spool-dir kernelCI_app/management/commands/tmp_submissions --trees-file volume_data/trees-name.yaml (being in the /backend directory, update the paths if necessary)
  • Insert submissions in the folder and check that they are being inserted in the database
    You can also test the ingester on docker, make sure the environment variables are on .env.backend and use docker compose run --rm backend before running the command shown above.

Closes #1571

@MarceloRobert MarceloRobert self-assigned this Oct 15, 2025


def build_instances_from_submission(data: dict[str, Any]) -> dict[TableNames, list]:
def build_instances_from_submission(data: dict[str, Any]) -> ProcessedSubmission:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to avoid using Any? Maybe with generic

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this data is the submission json, so the Any means any value that an item can have: str, int, list, dict, none... IMO in this case it is ok to leave it as Any (and other similar cases around these functions too). This use will at least define that they dict keys are str, otherwise we would be using just dict which just means dict[Any, Any]

STORAGE_BASE_URL = os.environ.get(
"STORAGE_BASE_URL", "https://files-staging.kernelci.org"
)
CONVERT_LOG_EXCERPT = False # If True, convert log_excerpt to output_files url
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not an env as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything was defined statically in the file, I just added verbose as an env var. I'll make the new constants file and I can make them all env vars 👍

)
CONVERT_LOG_EXCERPT = False # If True, convert log_excerpt to output_files url

TREES_FILE = "/app/trees.yaml"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think a new file:

constants/ingester.py?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

Comment on lines 40 to 41
if VERBOSE:
logger.info("Uploading logexcerpt for %s to %s", id, upload_url)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk this VERBOSE variable could be more descriptive

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the description for the VERBOSE variable or the string in this specific logger.info?

str: On success upload: the reference url. On failed upload: the original logexcerpt
"""

upload_url = f"{STORAGE_BASE_URL}/upload"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outside of function

@MarceloRobert MarceloRobert changed the title WIP: Refactor: ingester changes Refactor: ingester changes Oct 16, 2025
@MarceloRobert MarceloRobert marked this pull request as ready for review October 16, 2025 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor dashboard ingester

2 participants