Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

harvest.log summary does not agree with OpenSearch counts #147

Open
plawton-umd opened this issue Jan 22, 2024 · 2 comments
Open

harvest.log summary does not agree with OpenSearch counts #147

plawton-umd opened this issue Jan 22, 2024 · 2 comments
Assignees

Comments

@plawton-umd
Copy link

plawton-umd commented Jan 22, 2024

Checked for duplicates

Yes - I've already checked

πŸ› Describe the bug

When I did compared information from the harvest.log to the OpenSearch (OS) query results, I noticed differences.

πŸ•΅οΈ Expected behavior

I expected the "count" after the load to equal the "count" before the load plus the harvest.log's number of "Loaded Files".
The harvest.log summary says 150 fewer files were loaded than the OS "count" ( curl -u $REGUSER $OPENSEARCH_URL'/registry/_count?pretty=true' ) says.

πŸ“œ To Reproduce

  1. Maybe have a harvest run experience skips?

πŸ–₯ Environment Info

  • Version of this software 3.8.2
  • Operating System: Linux

🩺 Test Data / Additional context

See above

πŸ¦„ Related requirements

Tightly coupled with

βš™οΈ Engineering Details

N/A

@plawton-umd plawton-umd added bug Something isn't working needs:triage labels Jan 22, 2024
@plawton-umd plawton-umd changed the title harvest.log summary does not agree with OPenSearch counts harvest.log summary does not agree with OpenSearch counts Jan 22, 2024
@jordanpadams jordanpadams removed their assignment Feb 5, 2024
@jordanpadams jordanpadams added B15.0 and removed icebox labels Apr 10, 2024
@alexdunnjpl
Copy link
Contributor

This is a shot in the dark, but I don't want to overlook the potential of it being relevant - if the harvest is experiencing any errors due to timeouts, it's possible for them to be listed as failures (because the client never received confirmation that the insertions succeeded) but for them to be ingested nonetheless (because the server did get those insertions and processed them, but was overloaded at the time and took too long to handle them).

@plawton-umd if you have any firm sense of whether this is plausible, let me know

@plawton-umd
Copy link
Author

@alexdunnjpl No idea. Sometimes in the logs it looks like it

  • tried,
  • failed,
  • tried again later,
  • succeeded,
  • did not changed the 'failed' to 'failed, but retry succeeded' or reduce the number of filed products or update the overall success count

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Release Backlog
Status: ToDo
Development

No branches or pull requests

4 participants