Speedup send raw #100

tobiasgrosser · 2016-08-06T08:46:59Z

These three patches allow me to use btrbk to backup to a amazon cloud drive with reasonable speed using a acd_cli fuse mount. Before these changes the upload did not complete within 5 minutes (for a 300 MByte archive), now the upload is finished in 9 seconds.

We already perform compression before gpg, such that compressing in gpg is just a waste of time. Interestingly, it seems gpg is not trying to recompress gzip[ed] input streams, as for the default gzip compression this patch does not change performance. However, it is necessary for the upcoming lz4 compression to show its real benefit.

Here some statistic for my local backup: Size Time uncompressed 347M 4.4s gzip 105M 40.0s lz4 153M 5.2s Moving from gzip to lz4 increases the backup performance from 9 MB/s to 66 MB/s (measured un the uncompressed stream). As I am backing up over GBit ethernet, this makes a big difference.

By default dd uses blocks of size 512 Byte, which does not work well on network mounted fuse file systems (an acd_cli mount). Before this change copying 150MB took more than 5 minutes (I aborted), after this change it takes only 9 seconds.

digint · 2016-08-07T11:57:14Z

Merged "disable compression in gpg" in 84a5c56.

As I want to merge the stream_compression branch in the next major release, the other patches get obsolete:

the files are written using redirections > instead of dd
more compression algorithms are supported (see %compression hash, which supports lzop, which should give better performance than lz4 (untested))

@TobiG could you please give it a try and tell me if it also takes only 9 seconds on an acd_cli mount?

Note that the stream_compression branch should work, but I definitively have to find some time to perform more tests before merging it.

digint · 2016-08-07T12:08:48Z

which supports lzop, which should give better performance than lz4

I guess I was wrong here, lz4 seems to have some advantages. No multi-threading though. I'll add this to the %compression list.

tobiasgrosser · 2016-08-20T07:20:19Z

Just to give an update. I currently am traveling so I don't have a fast enough internet connection to perform the tests. I should get to it sometimes next week.

@digint : I just tried to access the stream_compression branch, but it seems to not be available any more? Did you remove it for some reason?

Also, did you plan to add the lz4 support to the stream_compression branch or should I use my own patch?

digint · 2016-08-20T12:37:32Z

Oh sorry for that, I deleted the stream_compression branch after merging it to devel branch. Should have mentioned it here.

lz4 compression is already included in devel branch.

tobiasgrosser · 2016-08-21T18:49:40Z

OK. Just managed to test the stream compress branch. I unfortunately had to create a new test subvolume, so the new performance results are not comparable.

The new test backup takes 731 MB uncompressed and 333 MB lz4 compressed.

With the patches in this PR a backup takes 0:15 min (compressed), 0:27 (uncompressed)

The latest devel branch takes 1:03 min (compressed), 2:26 (uncompressed).

It seems piping directly to a file adds still a lot more load to acd_cli compared to using dd with a large (4MB) blocksize.

digint · 2016-08-22T13:14:35Z

The new test backup takes 731 MB uncompressed and 333 MB lz4 compressed.
With the patches in this PR a backup takes 0:15 min (compressed), 0:27 (uncompressed)

As expected, your bottleneck is the network connection.

The latest devel branch takes 1:03 min (compressed), 2:26 (uncompressed).
It seems piping directly to a file adds still a lot more load to acd_cli compared to using dd with a large (4MB) blocksize.

Seems like acd_cli (or underlying "fusepy") has real trouble handling "sane" block sizes (compared to what I would call 4MB "insane"). As far as I know, the default write buffer size is 64K, the pipe buffer size is 4K (ulimit -p), and "normal" file system block size is 4K (stat -f /mnt/path). Just out of curiosity, what file system block size do you get on acd_cli fuse mounts?

I'm still convinced that using shell redirections is the correct way to create output files, simply because it's the "standard" way, which causes least trouble in "normal" circumstances. Please refer to issue #105 for further discussions about this (e.g. adding a feature raw_target_file_write_block_size which switches back to dd bs=xxx would be no problem if it really helps).

I'm closing this pull request now, please refer to #99 for further comments.

tobiasgrosser · 2016-08-22T14:04:13Z

On Mon, Aug 22, 2016, at 03:14 PM, Axel Burri wrote:

The new test backup takes 731 MB uncompressed and 333 MB lz4 compressed.
With the patches in this PR a backup takes 0:15 min (compressed), 0:27 (uncompressed)

As expected, your bottleneck is the network connection.

It is? I have a 900 Mbit connection and can reach this speed at some p

333 MB * 8 Mbit/MByte = 2664 Mbit

2664 Mbit / 15 seconds = 166 Mbit/sec

This is certainly below the 900 Mbit I have and amazon is pretty good in
reaching high network speeds. Here an earlier experiment I run on the
same internet connection and a larger file:

time 'btrfs' 'send' '-v' '/mnt/btrfs/btrbk_snapshots/test.20160822T1550'
| 'gpg' '--batch' '--no-tty' '--trust-model' 'always' '--compress-algo'
'none' '--default-recipient' 'XXXXXX '--encrypt' | 'dd' 'bs=512K'
'of=/media/acd/btrbk/test/test.20160822T1550--be8d1a43-d805-0448-808e-b576e9439c54.btrfs.gpg.part'
At subvol /mnt/btrfs/btrbk_snapshots/test.20160822T1550
BTRFS_IOC_SEND returned 0
joining genl thread
0+19609 records in
0+19609 records out
765762358 bytes (766 MB, 730 MiB) copied, 8.48394 s, 90.3 MB/s

real 0m8.850s
user 0m7.880s
sys 0m2.152s

This is 700 Mbit/second with encryption, but no compression.

My numbers are not yet very consistent, but my feeling is that can still
become a lot better.

The latest devel branch takes 1:03 min (compressed), 2:26 (uncompressed).
It seems piping directly to a file adds still a lot more load to acd_cli compared to using dd with a large (4MB) blocksize.

Seems like acd_cli (or underlying "fusepy") has real trouble handling
"sane" block sizes (compared to what I would call 4MB "insane"). As far
as I know, the default write buffer size is 64K, the pipe buffer size is
4K (ulimit -p), and "normal" file system block size is 4K (stat -f /mnt/path). Just out of curiosity, what file system block size do you
get on acd_cli fuse mounts?

$ ulimit -p
8

$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 30521
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 30521
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

$stat -f /media/acd
File: "/media/acd"
ID: 0 Namelen: 256 Type: fuseblk
Block size: 524288 Fundamental block size: 524288
Blocks: Total: 209715200 Free: 209674111 Available: 209674111
Inodes: Total: 0 Free: 0

I'm still convinced that using shell redirections is the correct way to
create output files, simply because it's the "standard" way, which causes
least trouble in "normal" circumstances. Please refer to issue #105 for
further discussions about this (e.g. adding a feature
raw_target_file_write_block_size which switches back to dd bs=xxx
would be no problem if it really helps).

OK. I will move further discussions to #105. Thank you very much for
your feedback.

Best,
Tobias

tobiasgrosser added 3 commits August 6, 2016 09:55

Use 4M block size for dd

7a732ce

By default dd uses blocks of size 512 Byte, which does not work well on network mounted fuse file systems (an acd_cli mount). Before this change copying 150MB took more than 5 minutes (I aborted), after this change it takes only 9 seconds.

digint mentioned this pull request Aug 17, 2016

acd_cli support #99

Open

digint closed this Aug 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup send raw #100

Speedup send raw #100

tobiasgrosser commented Aug 6, 2016

digint commented Aug 7, 2016

digint commented Aug 7, 2016

tobiasgrosser commented Aug 20, 2016

digint commented Aug 20, 2016

tobiasgrosser commented Aug 21, 2016

digint commented Aug 22, 2016

tobiasgrosser commented Aug 22, 2016

Speedup send raw #100

Speedup send raw #100

Conversation

tobiasgrosser commented Aug 6, 2016

digint commented Aug 7, 2016

digint commented Aug 7, 2016

tobiasgrosser commented Aug 20, 2016

digint commented Aug 20, 2016

tobiasgrosser commented Aug 21, 2016

digint commented Aug 22, 2016

tobiasgrosser commented Aug 22, 2016