Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement] zero_cache should write a full block when writing into all-zero block, thus avoiding read-modify-write cycle #201

Open
xmrk-btc opened this issue Jan 15, 2023 · 3 comments

Comments

@xmrk-btc
Copy link

When zero_cache is doing write_block_part into all-zero block, it passes the request to block cache, which possibly does not have the full block, so it needs to read it from remote storage. Which is unnecessary, zero_cache can reconstruct the whole block, and avoid the reading and delay.

A little test: zero the first 200 block (blocksize of s3backer is 64k)
$ dd if=/dev/zero of=mount/file bs=64k count=200

Write into the middle of 2nd block and see the log file

$ dd if=/dev/urandom of=mount/file bs=1k count=1 seek=100
$ journalctl -f -t s3backer
Jan 15 16:13:12 ubuntu s3backer[3679793]: GET https://objects-us-east-1.dream.io/test3/00000001
Jan 15 16:13:12 ubuntu s3backer[3679793]: rec'd 404 response: GET https://objects-us-east-1.dream.io/test3/00000001
Jan 15 16:13:12 ubuntu s3backer[3679793]: PUT https://objects-us-east-1.dream.io/test3/00000001
Jan 15 16:13:13 ubuntu s3backer[3679793]: success: PUT https://objects-us-east-1.dream.io/test3/00000001

And I want to get rid of that GET request. (Sometimes it does not appear, probably because I did various tests on the same bucket, so there may be stale data which get DELETEd and cached in block_cache.)

The real-world motivation here is latency - when resilvering ZFS after implementing sub-block hole punching, the resilvering was still slow (like 500 kB/s), without much disk activity, and I saw http_zero_blocks_read stat incrementing by 1 or 2 per second. So I guess the latency of the read-modify-write cycle was killing the performance.

@archiecobbs
Copy link
Owner

Good idea! Should be fixed in e0d21a9.

Regarding locking, this is not a problem. The race condition is between two filesystem threads accessing the same s3backer block at the same time; that's now absorbed at the top layer; see d949846.

@archiecobbs
Copy link
Owner

archiecobbs commented Jan 16, 2023

On second thought, I think that this idea is not actually safe.

Instead the safer (and simpler) fix is just to eliminate the function zero_cache_write_block_part(). Then the function block_part_write_block_part() will perform a read-modify-write, but the read will be intercepted by the zero cache, and therefore not result in any network traffic.

Updated in bb65ec2.

@archiecobbs
Copy link
Owner

Thinking even more, I'm still a little unsure about this. I think my second patch, while safe from race conditions, might also bring back the write amplification problem.

This needs more thought. E.g., consolidation of the block cache and the zero cache into a single entity, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants