Skip to content

Conversation

@redblackcoder
Copy link
Contributor

@redblackcoder redblackcoder commented Oct 23, 2025

uses all args to accurately estimate the work required in the restore indices job

This fixes the incorrect reporting in the restore indices job which believes there are lot more records to restore than necessary based on the arguments. Following is a sample run which shows the issue.

2025-10-23 18:38:51,894 [main] INFO  c.l.d.u.impl.DefaultUpgradeReport:15 - Found 761 latest aspects in aspects table in 0.00 minutes.
2025-10-23 18:38:51,894 [main] INFO  c.l.d.u.impl.DefaultUpgradeReport:15 - Getting next batch of urns + aspects, starting with urn:li:tag:Legacy - tagKey
2025-10-23 18:38:51,897 [pool-24-thread-1] INFO  c.l.d.u.impl.DefaultUpgradeReport:15 - Args are RestoreIndicesArgs(start=0, batchSize=50, limit=50, numThreads=1, batchDelayMs=250, gePitEpochMs=0, lePitEpochMs=0, createDefaultAspects=false, aspectName=null, aspectNames=[], urn=null, urnLike=null, urnBasedPagination=true, lastUrn=urn:li:tag:Legacy, lastAspect=tagKey)
2025-10-23 18:38:51,897 [pool-24-thread-1] INFO  c.l.d.u.impl.DefaultUpgradeReport:15 - Reading rows 0 through 50 (0 == infinite) in batches of 50 from the aspects table started.
2025-10-23 18:38:52,036 [pool-24-thread-1] INFO  c.l.d.u.impl.DefaultUpgradeReport:15 - Batch completed.
2025-10-23 18:38:52,294 [main] INFO  c.l.d.u.impl.DefaultUpgradeReport:15 - metrics so far RestoreIndicesResult(ignored=0, rowsMigrated=5, timeSqlQueryMs=12, timeGetRowMs=0, timeUrnMs=0, timeEntityRegistryCheckMs=0, aspectCheckMs=0, createRecordMs=0, sendMessageMs=109, defaultAspectsCreated=0, lastUrn=, lastAspect=)
2025-10-23 18:38:52,295 [main] INFO  c.l.d.u.impl.DefaultUpgradeReport:15 - Successfully sent MAEs for 5/761 rows (0.66% of total). 0 rows ignored (0.00% of total)
2025-10-23 18:38:52,295 [main] INFO  c.l.d.u.impl.DefaultUpgradeReport:15 - 0.01 mins taken. 1.01 est. mins to completion. Total mins est. = 1.01.
2025-10-23 18:38:52,295 [main] INFO  c.l.d.u.impl.DefaultUpgradeReport:15 - Rows processed this loop 5
2025-10-23 18:38:52,295 [main] INFO  c.l.d.u.impl.DefaultUpgradeReport:15 - Getting next batch of urns + aspects, starting with urn:li:tag:NeedsDocumentation - tagProperties
2025-10-23 18:38:52,295 [pool-24-thread-1] INFO  c.l.d.u.impl.DefaultUpgradeReport:15 - Args are RestoreIndicesArgs(start=50, batchSize=50, limit=50, numThreads=1, batchDelayMs=250, gePitEpochMs=0, lePitEpochMs=0, createDefaultAspects=false, aspectName=null, aspectNames=[], urn=null, urnLike=null, urnBasedPagination=true, lastUrn=urn:li:tag:NeedsDocumentation, lastAspect=tagProperties)
2025-10-23 18:38:52,295 [pool-24-thread-1] INFO  c.l.d.u.impl.DefaultUpgradeReport:15 - Reading rows 50 through 100 (0 == infinite) in batches of 50 from the aspects table started.
2025-10-23 18:38:52,298 [main] INFO  c.l.d.u.impl.DefaultUpgradeReport:15 - End of data.
2025-10-23 18:38:52,298 [main] INFO  c.l.d.u.impl.DefaultUpgradeReport:15 - Failed to send MAEs for 756 rows (99.34% of total).

@github-actions github-actions bot added product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment community-contribution PR or Issue raised by member(s) of DataHub Community labels Oct 23, 2025
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Oct 23, 2025
@codecov
Copy link

codecov bot commented Oct 23, 2025

Bundle Report

Changes will decrease total bundle size by 1.67kB (-0.01%) ⬇️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
datahub-react-web-esm 28.6MB -1.67kB (-0.01%) ⬇️

Affected Assets, Files, and Routes:

view changes for bundle: datahub-react-web-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/index-*.js -1.67kB 18.94MB -0.01%

@codecov
Copy link

codecov bot commented Oct 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Collaborator

@david-leifker david-leifker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thank you!

@david-leifker david-leifker enabled auto-merge (squash) October 24, 2025 14:01
@datahub-cyborg datahub-cyborg bot added merge-pending-ci A PR that has passed review and should be merged once CI is green. and removed needs-review Label for PRs that need review from a maintainer. labels Oct 24, 2025
@anshbansal anshbansal changed the title fix(upgrade reindex): Ensure count uses all the restore job args to accurately estimate the… fix(upgrade reindex): Fix count in restore job Oct 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PR or Issue raised by member(s) of DataHub Community devops PR or Issue related to DataHub backend & deployment merge-pending-ci A PR that has passed review and should be merged once CI is green. product PR or Issue related to the DataHub UI/UX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants