Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup send raw #100

Closed
wants to merge 3 commits into from
Closed

Conversation

tobiasgrosser
Copy link
Contributor

These three patches allow me to use btrbk to backup to a amazon cloud drive with reasonable speed using a acd_cli fuse mount. Before these changes the upload did not complete within 5 minutes (for a 300 MByte archive), now the upload is finished in 9 seconds.

We already perform compression before gpg, such that compressing in gpg
is just a waste of time. Interestingly, it seems gpg is not trying to
recompress gzip[ed] input streams, as for the default gzip compression
this patch does not change performance. However, it is necessary for
the upcoming lz4 compression to show its real benefit.
Here some statistic for my local backup:

			Size 	Time
	uncompressed	347M	 4.4s
	gzip		105M	40.0s
	lz4		153M	 5.2s

Moving from gzip to lz4 increases the backup performance from 9 MB/s to
66 MB/s (measured un the uncompressed stream). As I am backing up over
GBit ethernet, this makes a big difference.
By default dd uses blocks of size 512 Byte, which does not work well on
network mounted fuse file systems (an acd_cli mount). Before this change
copying 150MB took more than 5 minutes (I aborted), after this change it
takes only 9 seconds.
@digint
Copy link
Owner

digint commented Aug 7, 2016

Merged "disable compression in gpg" in 84a5c56.

As I want to merge the stream_compression branch in the next major release, the other patches get obsolete:

  1. the files are written using redirections > instead of dd
  2. more compression algorithms are supported (see %compression hash, which supports lzop, which should give better performance than lz4 (untested))

@TobiG could you please give it a try and tell me if it also takes only 9 seconds on an acd_cli mount?

Note that the stream_compression branch should work, but I definitively have to find some time to perform more tests before merging it.

@digint
Copy link
Owner

digint commented Aug 7, 2016

which supports lzop, which should give better performance than lz4

I guess I was wrong here, lz4 seems to have some advantages. No multi-threading though. I'll add this to the %compression list.

@digint digint mentioned this pull request Aug 17, 2016
@tobiasgrosser
Copy link
Contributor Author

Just to give an update. I currently am traveling so I don't have a fast enough internet connection to perform the tests. I should get to it sometimes next week.

@digint : I just tried to access the stream_compression branch, but it seems to not be available any more? Did you remove it for some reason?

Also, did you plan to add the lz4 support to the stream_compression branch or should I use my own patch?

@digint
Copy link
Owner

digint commented Aug 20, 2016

Oh sorry for that, I deleted the stream_compression branch after merging it to devel branch. Should have mentioned it here.

lz4 compression is already included in devel branch.

@tobiasgrosser
Copy link
Contributor Author

OK. Just managed to test the stream compress branch. I unfortunately had to create a new test subvolume, so the new performance results are not comparable.

The new test backup takes 731 MB uncompressed and 333 MB lz4 compressed.

With the patches in this PR a backup takes 0:15 min (compressed), 0:27 (uncompressed)

The latest devel branch takes 1:03 min (compressed), 2:26 (uncompressed).

It seems piping directly to a file adds still a lot more load to acd_cli compared to using dd with a large (4MB) blocksize.

@digint
Copy link
Owner

digint commented Aug 22, 2016

The new test backup takes 731 MB uncompressed and 333 MB lz4 compressed.
With the patches in this PR a backup takes 0:15 min (compressed), 0:27 (uncompressed)

As expected, your bottleneck is the network connection.

The latest devel branch takes 1:03 min (compressed), 2:26 (uncompressed).
It seems piping directly to a file adds still a lot more load to acd_cli compared to using dd with a large (4MB) blocksize.

Seems like acd_cli (or underlying "fusepy") has real trouble handling "sane" block sizes (compared to what I would call 4MB "insane"). As far as I know, the default write buffer size is 64K, the pipe buffer size is 4K (ulimit -p), and "normal" file system block size is 4K (stat -f /mnt/path). Just out of curiosity, what file system block size do you get on acd_cli fuse mounts?

I'm still convinced that using shell redirections is the correct way to create output files, simply because it's the "standard" way, which causes least trouble in "normal" circumstances. Please refer to issue #105 for further discussions about this (e.g. adding a feature raw_target_file_write_block_size which switches back to dd bs=xxx would be no problem if it really helps).

I'm closing this pull request now, please refer to #99 for further comments.

@digint digint closed this Aug 22, 2016
@tobiasgrosser
Copy link
Contributor Author

On Mon, Aug 22, 2016, at 03:14 PM, Axel Burri wrote:

The new test backup takes 731 MB uncompressed and 333 MB lz4 compressed.
With the patches in this PR a backup takes 0:15 min (compressed), 0:27 (uncompressed)

As expected, your bottleneck is the network connection.

It is? I have a 900 Mbit connection and can reach this speed at some p

333 MB * 8 Mbit/MByte = 2664 Mbit

2664 Mbit / 15 seconds = 166 Mbit/sec

This is certainly below the 900 Mbit I have and amazon is pretty good in
reaching high network speeds. Here an earlier experiment I run on the
same internet connection and a larger file:

time 'btrfs' 'send' '-v' '/mnt/btrfs/btrbk_snapshots/test.20160822T1550'
| 'gpg' '--batch' '--no-tty' '--trust-model' 'always' '--compress-algo'
'none' '--default-recipient' 'XXXXXX '--encrypt' | 'dd' 'bs=512K'
'of=/media/acd/btrbk/test/test.20160822T1550--be8d1a43-d805-0448-808e-b576e9439c54.btrfs.gpg.part'
At subvol /mnt/btrfs/btrbk_snapshots/test.20160822T1550
BTRFS_IOC_SEND returned 0
joining genl thread
0+19609 records in
0+19609 records out
765762358 bytes (766 MB, 730 MiB) copied, 8.48394 s, 90.3 MB/s

real 0m8.850s
user 0m7.880s
sys 0m2.152s

This is 700 Mbit/second with encryption, but no compression.

My numbers are not yet very consistent, but my feeling is that can still
become a lot better.

The latest devel branch takes 1:03 min (compressed), 2:26 (uncompressed).
It seems piping directly to a file adds still a lot more load to acd_cli compared to using dd with a large (4MB) blocksize.

Seems like acd_cli (or underlying "fusepy") has real trouble handling
"sane" block sizes (compared to what I would call 4MB "insane"). As far
as I know, the default write buffer size is 64K, the pipe buffer size is
4K (ulimit -p), and "normal" file system block size is 4K (stat -f /mnt/path). Just out of curiosity, what file system block size do you
get on acd_cli fuse mounts?

$ ulimit -p
8

$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 30521
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 30521
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

$stat -f /media/acd
File: "/media/acd"
ID: 0 Namelen: 256 Type: fuseblk
Block size: 524288 Fundamental block size: 524288
Blocks: Total: 209715200 Free: 209674111 Available: 209674111
Inodes: Total: 0 Free: 0

I'm still convinced that using shell redirections is the correct way to
create output files, simply because it's the "standard" way, which causes
least trouble in "normal" circumstances. Please refer to issue #105 for
further discussions about this (e.g. adding a feature
raw_target_file_write_block_size which switches back to dd bs=xxx
would be no problem if it really helps).

OK. I will move further discussions to #105. Thank you very much for
your feedback.

Best,
Tobias

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants