Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

677 refactor expunge v2 #841

Merged
merged 7 commits into from
Mar 20, 2025
Merged

677 refactor expunge v2 #841

merged 7 commits into from
Mar 20, 2025

Conversation

jsjiang
Copy link
Contributor

@jsjiang jsjiang commented Mar 19, 2025

@sfisher Hi Scott,
This is the cron version of the expunge which deletes test identifiers that are two weeks old. There are no changes to the core record deleting workflow which uses the EZID queues. Here are new features:

  • changed from daemon mode to cron
  • limit the select query to a specific time range
    • default date/time range: one day that is 2 weeks from today
    • defined date/time range: provided through command parameters
  • the default batch size for each query is 1000; you can define the batch size using the command parameter
  • a function get_id_range_by_time() is defined to get the identifiers' ID range from the created time range. The ID ranges are used for each batch select query.

Tests:

Here are the command paramters:
python manage.py proc-expunge_v2 --help

--batchsize BATCHSIZE
                        Rows in each batch select.
  --created_range_from CREATED_RANGE_FROM
                        Created date range from - local date/time in ISO 8601 format without timezone YYYYMMDD,
                        YYYYMMDDTHHMMSS, YYYY-MM-DD, YYYY-MM-DDTHH:MM:SS. Examples: 20241001, 20241001T131001,
                        2024-10-01, 2024-10-01T13:10:01 or 2024-10-01
  --created_range_to CREATED_RANGE_TO
                        Created date range to - local date/time in ISO 8601 format without timezone YYYYMMDD,
                        YYYYMMDDTHHMMSS, YYYY-MM-DD, YYYY-MM-DDTHH:MM:SS. Examples: 20241001, 20241001T131001,
                        2024-10-01, 2024-10-01T13:10:01 or 2024-10-01

Please review and let me know if you have questions.

Thank you

Jing

@jsjiang jsjiang requested a review from sfisher March 19, 2025 18:00
Copy link
Contributor

@sfisher sfisher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and much more flexible so that it runs at a consistent time.

I like that you're getting the pages by primary key so that it is much faster to get the records. I also like the shorter range.

This seems good to me.

@jsjiang jsjiang merged commit 364dfb5 into develop Mar 20, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants