Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

last_changed and last_checked value is not always same despite the website being changed in the latest run #2877

Closed
sanjeevbhusal opened this issue Jan 2, 2025 · 8 comments · Fixed by #2883
Assignees
Labels
bug Something isn't working

Comments

@sanjeevbhusal
Copy link

Describe the bug
last_changed and last_checked value is not same in api response even though the contents have been changed. /api/v1/watch endpoint returns the list of watches. Each watch contains few properties. last_changed and last_checked are among those properties.

In some cases, the value for last_changed and last_checked differ by few seconds even if the change was detected. This doesnot happen for every watch though. For example, take a look at the below json blob where last_changed is 2 seconds less than last_checked.

	"3868983e-eb7f-48dc-aa1e-61b95a4638bd": {
		"last_changed": 1735749736,
		"last_checked": 1735749738,
		"last_error": false,
		"title": null,
		"url": "https://thehimalayantimes.com/",
		"viewed": false
	},

This happens because sometimes, it might take few seconds to actually process a watch that has changed. To be specific, saving screenshot, saving history text etc might take few seconds to actually complete. The value of last_changed is recorded before doing any such calculation.However, the last_checked timestamp is calculated when the function finishes running.
I have verified it by looking the source code

Version
Exact version in the top right area: 0...
Any

How did you install?
Used as a SAAS

To Reproduce
Register a Website that is a bigger in size. This is so that we can increase the processing time which is responsible for this bug.

Expected behavior
If last_checked value is x and website has changed, last_changed should also be x.

Screenshots
If applicable, add screenshots to help explain your problem.
image
The timestamp calculated in line 517 should be used as 'last_checked' in line 574

Additional context
Add any other context about the problem here.

@sanjeevbhusal
Copy link
Author

Meanwhile, I would appreciate if you could guide me on how can i check if the latest watch has detected any changes via api. The less api call i need to make the better.

@dgtlmoon
Copy link
Owner

dgtlmoon commented Jan 2, 2025

Meanwhile, I would appreciate if you could guide me on how can i check if the latest watch has detected any changes via api. The less api call i need to make the better.

please do not confuse/make a mess of the issue, please try to stick to just one thing at a time or it makes a mess for everyone

@dgtlmoon
Copy link
Owner

dgtlmoon commented Jan 2, 2025

		"last_changed": 1735749736,
		"last_checked": 1735749738,

The system sets the last_checked after it's downloaded all the content and processed the "change detection" (this can take 1-2 seconds after the browser is finished), the actual change was really (and correctly) detected at 1735749736

it is a bit confusing i agree.. so hmm the last_changed should be set to last_checked

@sanjeevbhusal
Copy link
Author

sanjeevbhusal commented Jan 3, 2025

@dgtlmoon I assume you are occupied with other tasks. Do you mind if i submit a PR for this ?

@dgtlmoon
Copy link
Owner

dgtlmoon commented Jan 3, 2025

sure you can try, the PR must include a test (or modify the existing API test) , thanks

@dgtlmoon
Copy link
Owner

dgtlmoon commented Jan 6, 2025

The bug is also here

if update_handler.fetcher.content:
watch.save_last_fetched_html(contents=update_handler.fetcher.content, timestamp=timestamp)
does not correctly handle "zero content" responses

So what happened was that you received a zero content reply for some reason, in the API when you look at last_changed value it is actually checking the saved snapshots history.txt and finding the last date it was written

so what happened is that last_checked got updated, but last_changed did not because it is a zero content reply

@sanjeevbhusal
Copy link
Author

I am not exactly sure what you mean. Do you mean the bug is caused due to some other reason or there are other reasons that cause bug alongside what i mentioned.

I am aware last_changed is fetched from last written date in history.txt. The issue i mentioned is this last written date being different compared to last_checked date even if the change was detected in last run.

I am not sure what "zero content" responses are.

@dgtlmoon dgtlmoon added bug Something isn't working and removed triage labels Jan 6, 2025
@dgtlmoon
Copy link
Owner

dgtlmoon commented Jan 6, 2025

@sanjeevbhusal could you try the latest :dev container? try to test that, it should be resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants