Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(server): generate all thumbnails for an asset in one job #13012

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mertalev
Copy link
Contributor

Description

This PR changes image thumbnail generation to first decode and resize the input image to a preview-sized raw buffer, then reuse that buffer as the input for the preview, thumbnail and thumbhash. This has several advantages:

  • As reading from disk and decoding the image are the slowest steps of thumbnail generation, this makes generating thumbnails significantly faster
  • It is more cache-efficient due to handling everything to do with a file at once instead of spreading this out into jobs that run far apart from one another
  • It is easier to understand thumbnail generation progress when there is a 1:1 relationship between jobs and assets. It's currently a common source of confusion that the number of jobs keeps growing upward, and it's difficult to understand how much progress has been made
  • Other steps during thumbnail generation only need to run once, such as extracting embedded images or running FFprobe

A concern I had was around RAM usage. Remarkably, this approach uses less RAM than the current one despite the use of an in-memory buffer. Peak memory usage dropped from 2.47GiB to 1.22GiB during testing, and the average is also lower compared to preview generation.

Video thumbnails are not optimized in this PR to keep it smaller and more focused. A later PR will use a similar approach through FFmpeg. In the meantime, I refactored the code such that it will be easy to make this change. I intentionally made it generate video thumbnails sequentially instead of using Promise.all to avoid excessive resource usage.

Testing

I ran thumbnail generation on all assets and confirmed that jobs were processed successfully. Setting different image settings and enabling the "extract embedded preview" setting all functioned as expected. I additionally ran an image-only comparison on all assets:

thumbnail-generation-memory-usage

In order from left to right:

  • Thumbnail generation start on main
  • Thumbnail generation previews completed on main (starting thumbnails)
  • Thumbnail generation finished on main
  • Thumbnail generation started in PR
  • Thumbnail generation finished in PR

On main, generating the previews took 43 minutes and completing all jobs took 76 minutes , while this PR completed all thumbnail generation tasks in 44 minutes. For this particular library with default settings, this is a ~73% speedup for images. The gap will widen with more RAW or high megapixel images and when the "extract embedded preview" setting is enabled.

cleanup

add success logs, rename method

do thumbhash too

fixes

fix tests

handle `notify`

wip refactor

refactor
@mertalev mertalev force-pushed the perf/server-preview-thumbnail-together branch from a233104 to ad1b572 Compare September 28, 2024 07:28
Copy link
Contributor

@jrasm91 jrasm91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

if (options.crop) {
pipeline.extract(options.crop);
}
async decodeImage(input: string, options: DecodeToBufferOptions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary async, right?

Comment on lines +486 to 497
'should generate a %s %s for an image when specified',
async ([format, type]) => {
systemMock.get.mockResolvedValue({ image: { [type]: { format: format as ImageFormat } } });
assetMock.getById.mockResolvedValue(assetStub.image);
const thumbhashBuffer = Buffer.from('a thumbhash', 'utf8');
mediaMock.generateThumbhash.mockResolvedValue(thumbhashBuffer);

it('should skip invisible assets', async () => {
assetMock.getByIds.mockResolvedValue([assetStub.livePhotoMotionAsset]);
const previewFormat = type === AssetFileType.PREVIEW ? format : ImageFormat.JPEG;
const thumbnailFormat = type === AssetFileType.THUMBNAIL ? format : ImageFormat.WEBP;
const previewPath = `upload/thumbs/user-id/as/se/asset-id-preview.${previewFormat}`;
const thumbnailPath = `upload/thumbs/user-id/as/se/asset-id-thumbnail.${thumbnailFormat}`;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test has little value imo. Looking at it, it is not clear what it is doing. It takes reading and understanding the test code and checking the different branching going on here. Tests should not have branching and should be very simple. I would prefer 4 individual, trivial tests that are easy to read.

await this.assetRepository.upsertFile({ assetId: asset.id, type: AssetFileType.THUMBNAIL, path: thumbnailPath });
await this.assetRepository.update({ id: asset.id, updatedAt: new Date() });
await this.assetRepository.upsertJobStatus({ assetId: asset.id, thumbnailAt: new Date() });
await this.assetRepository.upsertJobStatus({ assetId: asset.id, previewAt: new Date(), thumbnailAt: new Date() });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future should we combine these into one column?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants