Streaming base64 #4944

LevitatingOrange · 2022-08-25T13:01:57Z

LevitatingOrange
Aug 25, 2022

Hello,

I have a usecase where I need to load a file from the filesystem, encode it into base64 and send it as an REST-API call via HTTP. I thought that the naive way of doing things, that is, reading the file into a buffer in memory at once and then encoding it, would be a performance and especially memory bottle neck (the files are typically at least 5MB big and I am writing a server application where this behaviour are triggered by user requests). So I went out and experimented a bit in a separate project:

I fiddled around with tokio-util's codecs and tried to write a very simple codec that takes in binary data, encodes the data with base64 and returns lines of base64 text with a configurable line length:

pub struct Base64LineDecoder {
    num_bytes_in: usize,
    line_size: usize,
    buf: BytesMut,
}

impl Base64LineDecoder {
    pub fn new(line_size: usize) -> Result<Self, Error> {
        if line_size % 4 != 0 {
            return Err(Error::InvalidLineSize(line_size));
        }
        let num_bytes_in = (line_size / 4) * 3;
        Ok(Self {
            // Base64 maps 3 input to 4 output bytes
            num_bytes_in,
            line_size,
            buf: BytesMut::with_capacity(line_size + 1),
        })
    }
}

impl Decoder for Base64LineDecoder {
    type Item = Bytes;

    type Error = Error;

    fn decode(&mut self, src: &mut BytesMut) -> Result<Option<Self::Item>, Self::Error> {
        if src.len() < self.num_bytes_in {
            src.reserve(self.num_bytes_in - src.len());
            return Ok(None);
        }

        //let mut output = String::with_capacity(self.num_bytes + 1);
        // TODO: unsafe set_len?
        self.buf.resize(self.line_size + 1, 0);
        base64::encode_config_slice(
            &src[0..self.num_bytes_in],
            base64::STANDARD,
            &mut self.buf[..self.line_size],
        );
        *self.buf.last_mut().unwrap() = '\n' as u8;
        src.advance(self.num_bytes_in);
        return Ok(Some(self.buf.split().freeze()));
    }

    fn decode_eof(&mut self, src: &mut BytesMut) -> Result<Option<Self::Item>, Self::Error> {
        if src.is_empty() {
            return Ok(None);
        }
        let last_line_size = ((src.remaining() + 2) / 3) * 4;
        self.buf.resize(last_line_size, 0);
        base64::encode_config_slice(&src[..], base64::STANDARD, &mut self.buf[..]);
        src.advance(src.remaining());
        return Ok(Some(self.buf.split().freeze()));
    }
}

Now I benchmarked and tested this by reading a file from memory, encoding it, and writing the result into a different file:

#[tokio::main]
async fn main() {
    let filename = std::env::args().nth(1).unwrap();
    let line_size: usize = std::env::args().nth(2).unwrap().parse().unwrap();
    let num_bytes: usize = (line_size / 4) * 3;

    let mut input_file = File::open(filename).await.unwrap();
    let mut output_file = File::from(tempfile::tempfile().unwrap());

    let now = Instant::now();
    let decoder = Base64LineDecoder::new(line_size).unwrap();
    let mut wrapped_reader = StreamReader::new(
        FramedRead::new(input_file, decoder)
            .map(|s| s.map_err(|s| std::io::Error::new(std::io::ErrorKind::Other, s))),
    );
    io::copy(&mut wrapped_reader, &mut output_file)
        .await
        .unwrap();
    println!("Decoder took {}s", now.elapsed().as_secs_f64());
}

I also implemented the naive way by loading the file into a buffer, encoding the whole buffer, and writing it to a file.

For a 100MB input file with a line length of 128 bytes, I benchmarked the performance (just one run, no statistical benchmarks a la criterion), produced flamegraphs and run the memory analyzer massif to get memory consumption:

Naive way:
- Execution time: 0.318s
- Memory consumption:
- Flamegraph:
Streaming:
- Execution time: 8.48s
- Memory consumption:
- Flamegraph:

Now my intuition regarding memory consumption was correct; the streaming approach only requires a small, constant amount of memory compared to the naive way. My intuition regarding performance though, is way off. I thought that the streaming approach should be comparable or even faster but instead it is an order of a magnitude slow. It gets better with larger line sizes with a break even point at a line size of around 4096 bytes.

My conclusion would be, that the overhead for each call to the decoder gets noticeable with small line sizes, but I am just surprised how large the difference is. Is there something sub-optimal with my code? Could I be doing something better, different there? My understanding is, that there should be no reallocations or new allocations in the encode function, because I use Bytes and split them off from the main buffer. Is this a wrong assumption?

Or, maybe, is this just to be expected and when I want to use this code in my main project, I have to make sure to use large enough line sizes?

I hope this is the correct place to ask these questions. I am still pretty new to low level network coding so any advice here would be greatly appreciated.

Cheers

Answered by Darksonn

Aug 26, 2022

The overhead you are seeing is probably entirely the fault of file IO being slow in async/await. Every single IO operation involves a spawn_blocking roundtrip, and if you are sending only 128 bytes per call, you have a lot of them.

Async/await is good for network IO, not so much for file IO.

There are various other inefficiencies (e.g. you can avoid a copy by using tokio::io::copy_buf instead), but I would think that this pales in comparison to the overhead involved with file IO. A buffered writer would probably help quite a lot, but only because it decreases the number of file operations by increasing the size of each one.

View full answer

Darksonn · 2022-08-26T08:40:32Z

Darksonn
Aug 26, 2022
Maintainer

The overhead you are seeing is probably entirely the fault of file IO being slow in async/await. Every single IO operation involves a spawn_blocking roundtrip, and if you are sending only 128 bytes per call, you have a lot of them.

Async/await is good for network IO, not so much for file IO.

There are various other inefficiencies (e.g. you can avoid a copy by using tokio::io::copy_buf instead), but I would think that this pales in comparison to the overhead involved with file IO. A buffered writer would probably help quite a lot, but only because it decreases the number of file operations by increasing the size of each one.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming base64 #4944

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Streaming base64 #4944

LevitatingOrange Aug 25, 2022

Replies: 1 comment

Darksonn Aug 26, 2022 Maintainer

LevitatingOrange
Aug 25, 2022

Darksonn
Aug 26, 2022
Maintainer