Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable uint used for container lengths (like string or vec) #97

Open
narodnik opened this issue Apr 17, 2024 · 1 comment
Open

Variable uint used for container lengths (like string or vec) #97

narodnik opened this issue Apr 17, 2024 · 1 comment

Comments

@narodnik
Copy link

See the example code from: https://github.com/darkrenaissance/darkfi/blob/master/src/serial/src/lib.rs#L271

/// Variable-integer encoding.
/// Integer can be encoded depending on the represented value to save space.
/// Variable length integers always precede an array/vector of a type of data
/// that may vary in length. Longer numbers are encoded in little endian.
///
/// | Value         | Storage length | Format                              |
/// |---------------|----------------|-------------------------------------|
/// | <= 0xfc       | 1              | u8                                  |
/// | <= 0xffff     | 3              | `0xfd` followed by `value` as `u16` |
/// | <= 0xffffffff | 5              | `0xfe` followed by `value` as `u32` |
/// | -             | 9              | `0xff` followed by `value` as `u64` |
///
/// See also [Bitcoin variable length integers](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer).
#[derive(Debug, PartialEq, Eq)]
pub struct VarInt(pub u64);

impl VarInt {
    /// Gets the length of this `VarInt` when encoded.
    /// Returns:
    /// * 1 for 0..0xFC
    /// * 3 for 0xFD..(2^16-1)
    /// * 5 for 0x10000..(2^32-1)
    /// * 9 otherwise
    #[inline]
    pub fn length(&self) -> usize {
        match self.0 {
            0..=0xFC => 1,
            0xFD..=0xFFFF => 3,
            0x10000..=0xFFFFFFFF => 5,
            _ => 9,
        }
    }
}

impl Encodable for VarInt {
    #[inline]
    fn encode<S: Write>(&self, mut s: S) -> Result<usize, Error> {
        match self.0 {
            0..=0xFC => {
                (self.0 as u8).encode(s)?;
                Ok(1)
            }

            0xFD..=0xFFFF => {
                s.write_u8(0xFD)?;
                (self.0 as u16).encode(s)?;
                Ok(3)
            }

            0x10000..=0xFFFFFFFF => {
                s.write_u8(0xFE)?;
                (self.0 as u32).encode(s)?;
                Ok(5)
            }

            _ => {
                s.write_u8(0xFF)?;
                self.0.encode(s)?;
                Ok(9)
            }
        }
    }
}

impl Decodable for VarInt {
    #[inline]
    fn decode<D: Read>(mut d: D) -> Result<Self, Error> {
        let n = ReadExt::read_u8(&mut d)?;
        match n {
            0xFF => {
                let x = ReadExt::read_u64(&mut d)?;
                if x < 0x100000000 {
                    return Err(Error::new(ErrorKind::Other, "Non-minimal VarInt"))
                }
                Ok(VarInt(x))
            }

            0xFE => {
                let x = ReadExt::read_u32(&mut d)?;
                if x < 0x10000 {
                    return Err(Error::new(ErrorKind::Other, "Non-minimal VarInt"))
                }
                Ok(VarInt(x as u64))
            }

            0xFD => {
                let x = ReadExt::read_u16(&mut d)?;
                if x < 0xFD {
                    return Err(Error::new(ErrorKind::Other, "Non-minimal VarInt"))
                }
                Ok(VarInt(x as u64))
            }

            n => Ok(VarInt(n as u64)),
        }
    }
}

Always using u32 or u64 for array lengths is wasteful since many containers only contain a byte's worth of items.

@knickish
Copy link
Collaborator

Varint encoding definitely would save some space for the binary format at least, but I'm a little concerned about the performance overhead all the additional branches would cause. I would be happy to at least consider and benchmark a PR adding this. (would also need to wait for a major release, as this would break reading of already-serialized objects)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants