Skip to content
This repository has been archived by the owner on Jul 5, 2024. It is now read-only.

Unable to compare existing files downloaded via other methods. As result, onedriveClient is downloading/uploading everything again #22

Open
modelmat opened this issue Sep 30, 2018 · 14 comments
Assignees
Labels

Comments

@modelmat
Copy link
Collaborator

If a directory is used which had been previously synced, this program will upload each file again as file {desktop}, which wastes my bandwith.

Can the file hashes be compared first?

@derrix060
Copy link
Owner

What do you mean when you say "a directory is used"? Are the files going to change the timestamp? Is there any file that has its hash changed?

@modelmat
Copy link
Collaborator Author

modelmat commented Oct 1, 2018

Ie. If I have previously copied my OneDrive directory (using another sync tool, for example), then syncing will reupload every single file with the added (desktop) suffix to the end. Suppose this would be fixed with #17 though.

I meaning don't reupload files with (desktop) suffix if the file on cloud has the same hash, if the local one is newer overwrite it, if the cloud is newer download the cloud.

@derrix060
Copy link
Owner

The way that is doing now to know if a file is different is first looking at the path (and the parent repository) + the filename. If matches with the cloud, then check the timestamp + hash.

I'm changing a little bit this behaviour on #17...

Let me see if I understood what you are saying:

  • You set the onedrive to sync with an empty directory
  • You download the files from OneDrive using other tool/manually
  • You copied these files to the empty dir that you said to sync

am I right?

@modelmat
Copy link
Collaborator Author

modelmat commented Oct 1, 2018

Yes.

@derrix060
Copy link
Owner

What I think that is weird in this case is when you set to sync, it should download everything again...

I remember that I had the same issue when I first started to look at this project, what I did is gave up and let the onedrive download everything...

Can you make sure that the files are in the same structure and that the framework is uploading the file, not only the timestamp?

@derrix060 derrix060 changed the title Lots of files uploaded with filename ({hostname}) Unable to compare existing files downloaded via other methods. As result, onedriveClient is downloading/uploading everything again Oct 1, 2018
@modelmat
Copy link
Collaborator Author

modelmat commented Oct 1, 2018

I actually tried it again and it seemed not to be, but I have just decided to redownload everything from scratch (deleted with rm :P) so I can't test til it syncs again.

@derrix060
Copy link
Owner

derrix060 commented Oct 2, 2018

Investigating #21 I've found why the framework was uploading duplicates.

There are a couple of issues, I will try to explain the steps to check if the item is the same:

Check if the item exists locally

  • id should match
  • c_tag or (size and timestamp) should match

Check if the item has changed:

  • size should match
  • timestamp or hash should match.

Issues:

One possible way to do is to download the file, calculate the hash and see if it maches, or (how is now), upload the file with a different name. I will think more about how to know if the file is the same or not, and figure out the best way.

@modelmat
Copy link
Collaborator Author

modelmat commented Oct 2, 2018

I assume you meant #22 not 21.

Especially for this issue, if all the files will be downloaded or uploaded as dupes, as long as the time is pretty close it can be assumed to be the same - if there is a substantial difference maybe it should be uploaded (though this should definitely be given to the user) as an option).

@derrix060
Copy link
Owner

No I mean 21 haha. I was debugging that error and found this...

Usually, the download speed is higher than upload, so I will download the file (hope that the file is not large...) and compare the hash. If the hashes are different, I will keep both locally and on the cloud, letting the user decide which one is up-to-date.

@derrix060 derrix060 added the bug label Oct 3, 2018
@derrix060 derrix060 self-assigned this Oct 3, 2018
@abraunegg
Copy link

@derrix060
You will also run into this issue: OneDrive/onedrive-api-docs#935

The timestamp can be slitly different (some seconds)

Baseline all 'timestamps' (local and OneDrive) to drop fraction seconds - HH:MM:SS is what should be compared otherwise timestamps will always be an issue

@derrix060
Copy link
Owner

@abraunegg thank's for the information! I'm planning to do a very bad workaround: download the file, calculate the checksum manually, and see it matches...

BTW, you have a nice project, congrats!!

@modelmat
Copy link
Collaborator Author

modelmat commented Dec 5, 2018

Maybe it should only download if the timestamp is within 5 minutes or so? This allows for timestamps to be slightly off on onedrive's end and even on the client's end due to clock drift

@derrix060
Copy link
Owner

Why 5min? Is possible to change a file on the remote, and before 5min change locally as well... It would cover some cases, but not all...

@modelmat
Copy link
Collaborator Author

modelmat commented Dec 5, 2018

I was thinking that 5 minutes would be a reasonable time. Even 10 seconds or so would probably be enough - what I am trying to say is there is no point downloading if the timestamp was say, 2 years apart - there is no point downloading then.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants