Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Save (lots of) storage space by using hard links for duplicate photos and videos when running full backup #176

Open
d-EScape opened this issue Nov 25, 2023 · 4 comments
Assignees
Labels
new feature New feature or request
Milestone

Comments

@d-EScape
Copy link

d-EScape commented Nov 25, 2023

I just discovered iCloud-drive-docker and first impressions are great. Thank you for your work and sharing this project.

Use case
As a user with lots of photos and videos organized in many albums on iCloud, I want to make a full (structured) backup so that I can always restore my files in case of a iCloud disaster.

What is the problem?
iCloud-drive-docker seems to do exactly what I want with the configuration "all_albums: true", BUT... as described in the config file it will store duplicates for the same file in different albums.
There will always be at least one duplicate because of the "All Photos" album. Videos are even worse. These large files are duplicated in the album(s) folder AND Videos folder AND All Photos folder. Gopro files even have there own GoPro album by default, so that adds a another duplicate.
The used file storage space is adding up quickly.

Describe the solution you'd like
First sync the "All Photo's" album and upon syncing other albums create a hard link to the already existing file in "All Photo's" instead of copying the entire file again.

Considerations:
If All Photos is synced first it should not be necessary to check every other album on the (target) filesystem for existing duplicates. So having "All Photos" synced first becomes a requirement to make this deduplication as simple as possible.
Hardlinks behave just like the original file and the physical file will remain intact until the last link to it is removed, so this should be a safe approach. Even if someone would manually remove files from the "All Photos" or other folders, the files will still be accessibel through the hard links.

So why do I look at iCloud-drive-docker for a backup use case?
I can't do a client side backup using the apple software, because I have set al my apple photo clients to "optimize storage", so the original versions are not always available on every client. There are more photo's in iCloud than would fit on the local ssd.

@d-EScape d-EScape changed the title Save (lots of space) by using hard links for duplicate photos when running full backup Save (lots of) storage space by using hard links for duplicate photos and videos when running full backup Nov 25, 2023
@mandarons mandarons changed the title Save (lots of) storage space by using hard links for duplicate photos and videos when running full backup [FEATURE] Save (lots of) storage space by using hard links for duplicate photos and videos when running full backup Nov 26, 2023
@mandarons mandarons added the new feature New feature or request label Nov 26, 2023
@mandarons
Copy link
Owner

This is a good one. Thanks for submitting. 👍🏼

@d-EScape
Copy link
Author

I created a little proof of concept by editing the download_photo function (see below).
Without this modification my (test) iCloud photo's backup took 79GB. It is now 42GB with exactly the same photo library!

def download_photo(photo, file_size, destination_path):
    """Download photo from server."""
    ALLPATH="/app/icloud/photos/All Photos/"
    if not (photo and file_size and destination_path):
        return False
    LOGGER.info(f"Downloading {destination_path} ...")
    existing_path=ALLPATH + '/' + destination_path.split("/")[-1]
    LOGGER.info(f"Check if exists {existing_path}")
    if photo_exists(photo, file_size, existing_path):
        LOGGER.info(f"Existing photo. Try and link {destination_path} to {existing_path}")
        try:
            os.link(existing_path, destination_path)
        except Exception as e:
            LOGGER.error(f"Failed to link {destination_path} to {existing_path}: {str(e)}")
            return False
    else:
        try:
            download = photo.download(file_size)
            with open(destination_path, "wb") as file_out:
                shutil.copyfileobj(download.raw, file_out)
            local_modified_time = time.mktime(photo.added_date.timetuple())
            os.utime(destination_path, (local_modified_time, local_modified_time))
        except (exceptions.ICloudPyAPIResponseException, FileNotFoundError, Exception) as e:
            LOGGER.error(f"Failed to download {destination_path}: {str(e)}")
            return False
    return True

@mandarons
Copy link
Owner

Please feel free to submit a PR, if possible.

@d-EScape
Copy link
Author

It’s just a proof of concept and far from ready for a PR. The all photos path is hardcoded and I’m getting the photo filename by splitting the destination path. I haven’t really figured out how you are generating and sharing this kind of variables. I was hoping you could use this in a future release.

@mandarons mandarons added this to the 1.17.0 milestone Feb 19, 2024
@mandarons mandarons modified the milestones: 1.18.1, 1.19.0 May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
Status: 📋 TODO
Development

No branches or pull requests

2 participants