Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions COMPARE.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
- LMC (chunking)
- RAM Chunking (Rapid Asymmetric Maximum)
- doi:10.1016/j.future.2017.02.013
- MII (minimal incrimental interval)
- MII (minimal incremental interval)
- doi:10.1109/access.2019.2926195
- [TTTD](https://scholarworks.sjsu.edu/cgi/viewcontent.cgi?referer=&httpsredir=1&article=1041&context=etd_projects)
- [FBC](doi:10.1109/mascots.2010.37)
Expand All @@ -32,7 +32,7 @@

# Impl Features

- Incrimental input: rather than require a single `&[u8]` up front, allow
- incremental input: rather than require a single `&[u8]` up front, allow
providing a number of `&[u8]`s over the life of the splitter/hasher.

- Slice input vs byte-at-a-time: By allowing algorithms to take in larger
Expand All @@ -45,8 +45,8 @@
- latest release: 2017-09-09
- inactive development (as of 2020-06-21)
- algorithm(s): "Rabin64" (polynomial based, 64-bit)
- incrimental input: no
- no documentation indicates incrimental input is possible
- incremental input: no
- no documentation indicates incremental input is possible
- while one could use a special impl of `Iterator<Item=u8>` that can be
extended, this would only work if the `SeperatorIter` or `ChunkIter` had
not emitted a final incomplete chunk/seperator.
Expand All @@ -67,7 +67,7 @@
- latest release: 2020-03-19, v1.0.3
- active development (as of 2020-06-21)
- algorithm(s): FastCDC
- incrimental input: no
- incremental input: no
- api:
- input: one `&[u8]`
- output: `Iterator<Item=Chunk> where Chunk: (offset: usize, size:
Expand All @@ -82,7 +82,7 @@
- latest release: 2018-12-17 v1.0.0 (no other releases)
- inactive development (as of 2020-06-21)
- algorithm(s): AE (with modifications/extensions)
- incrimental input: no
- incremental input: no
- api:
- input: one `&[u8]`
- output: `Iterator<Item=&[u8]>`
Expand All @@ -96,7 +96,7 @@
- latest release: 2020-04-12 v0.1.3
- active development (as of 2020-06-21)
- algorithm(s): gear
- incrimental input: yes
- incremental input: yes
- provides simd & scalar impls
- includes a static table for gearhash
- api: call `next_match()` repeatedly with new slices. Returns a
Expand Down Expand Up @@ -130,7 +130,7 @@
- algorithm(s):
- rollsum (based on bupsplit, based on rsync chunking)
- gear
- incrimental input: yes
- incremental input: yes
- includes a static table for gearhash
- low level trait has byte-by-byte and slice based interfaces
- exposes conditionality of chunk edge (ie: like a rolling-sum) in trait,
Expand Down
4 changes: 2 additions & 2 deletions src/bup.rs
Original file line number Diff line number Diff line change
Expand Up @@ -127,11 +127,11 @@ impl Default for RollSum {
}
}

/// Incrimental instance of [`RollSum`]
/// incremental instance of [`RollSum`]
///
/// Performance note: Bup's Roll sum algorithm requires tracking the entire window. As a result,
/// this includes a circular buffer which all inputs are copied through. If your use case allows
/// it, use the non-incrimental variant for improved performance.
/// it, use the non-incremental variant for improved performance.
#[derive(Clone, PartialEq, Eq)]
pub struct RollSumIncr {
state: RollSumState,
Expand Down
2 changes: 1 addition & 1 deletion src/fastcdc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ impl<'a> From<&FastCdc<'a>> for FastCdcIncr<'a> {
}
}

/// FastCdcIncr provides an incrimental interface to `FastCdc`
/// FastCdcIncr provides an incremental interface to `FastCdc`
///
/// This impl does not buffer data passing through it (the FastCDC algorithm does not require
/// look-back) making it very efficient.
Expand Down
34 changes: 17 additions & 17 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
//!
//! - Configured Algorithm Instance (impliments [`Chunk`]). Named plainly using the algorithm name
//! (like [`Bup`]). These can be thought of as "parameters" for an algorithm.
//! - Incrimental (impliments [`ChunkIncr`]). Normally named with `Incr` suffix. These are created
//! - incremental (impliments [`ChunkIncr`]). Normally named with `Incr` suffix. These are created
//! using [`ToChunkIncr`] for a configured algorithm instance.
//!
//! Because of the various ways one might use a CDC, and the different CDC algorithm
Expand All @@ -24,12 +24,12 @@
//! algorithm. For example, this might mean configuring a window size or how to decide where to
//! split. These don't include any mutable data, in other words: they don't keep track of what data
//! is given to them. Configured Algorithm Instances provide the all-at-once APIs, as well as
//! methods to obtain other kinds of APIs, like incrimental style apis.
//! methods to obtain other kinds of APIs, like incremental style apis.
//!
//! ```rust
//! use hash_roll::ToChunkIncr;
//! let algorithm_instance = hash_roll::mii::Mii::default();
//! let _incrimental_comp = algorithm_instance.to_chunk_incr();
//! let _incremental_comp = algorithm_instance.to_chunk_incr();
//! ```
//!
//! ## CDC Algorithms and Window Buffering
Expand All @@ -42,26 +42,26 @@
//! For the window-buffering algorithms, their is an extra cost to certain types of API
//! implimentations. The documentation will note when these occur and suggest alternatives.
//!
//! Generally, CDC interfaces that are incrimental will be slower for window-buffering algorithms.
//! Generally, CDC interfaces that are incremental will be slower for window-buffering algorithms.
//! Using an explicitly allocating interface (which emits `Vec<u8>` or `Vec<Vec<u8>>`) will have no
//! worse performance that the incrimental API, but might be more convenient. Using an all-at-once
//! worse performance that the incremental API, but might be more convenient. Using an all-at-once
//! API will provide the best performance due to not requiring any buffering (the input data can be
//! used directly).
//!
//! ## Use Cases that drive API choices
//!
//! - accumulate vecs, emits vecs
//! - incrimental: yes
//! - incremental: yes
//! - input: `Vec<u8>`
//! - internal state: `Vec<Vec<u8>>`
//! - output: `Vec<Vec<u8>>`
//!
//! - stream data through
//! - incrimenal: yes
//! - incremenal: yes
//! - input: `&[u8]`
//!
//! - mmap (or read entire) file, emit
//! - incrimenal: no
//! - incremenal: no
//! - input: `&[u8]`
//! - output: `&[u8]`

Expand All @@ -71,7 +71,7 @@
//
// - place methods that might have more optimized variants, but can have common implimentations,
// in a trait. This notably affects window-buffering differences: it's always possible to
// impliment all-at-once processing using incrimental interfaces that internally buffer, but
// impliment all-at-once processing using incremental interfaces that internally buffer, but
// it's much more efficient for window-buffering algorithms to provide implimentations that know
// how to look into the input data directly.

Expand Down Expand Up @@ -116,7 +116,7 @@ pub mod zstd;

pub(crate) use range::RangeExt;

/// Accept incrimental input and provide indexes of split points
/// Accept incremental input and provide indexes of split points
///
/// Compared to [`Chunk`], [`ChunkIncr`] allows avoiding having to buffer all input data in memory,
/// and avoids the need to use a single buffer for storing the input data (even if all data is in
Expand All @@ -127,11 +127,11 @@ pub(crate) use range::RangeExt;
/// (like `ZstdRsyncable` does). If you have multiple "sources", one should obtain new instances of
/// [`ChunkIncr`] for each of them (typically via [`ToChunkIncr`]).
///
/// Note that for some splitting/chunking algorithms, the incrimental api will be less efficient
/// compared to the non-incrimental API. In particular, algorithms like [`Rsyncable`] that require
/// Note that for some splitting/chunking algorithms, the incremental api will be less efficient
/// compared to the non-incremental API. In particular, algorithms like [`Rsyncable`] that require
/// the use of previously examined data to shift their "window" (resulting in needing a circular
/// buffer which all inputed data passes through) will perform more poorly using [`ChunkIncr`]
/// compared with non-incrimental interfaces
/// compared with non-incremental interfaces
pub trait ChunkIncr {
/// The data "contained" within a implimentor of this trait is the history of all data slices
/// passed to feed.
Expand Down Expand Up @@ -164,7 +164,7 @@ pub trait ChunkIncr {
/// Does not return the remainder (if any) in the iteration. Use [`IterSlices::take_rem()`] or
/// [`IterSlices::into_parts()`] to get the remainder.
///
/// Note that this is a non-incrimental interface. Calling this on an already fed chunker or using
/// Note that this is a non-incremental interface. Calling this on an already fed chunker or using
/// this multiple times on the same chunker may provide unexpected results
fn iter_slices_strict(self, data: &[u8]) -> IterSlicesStrict<'_, Self>
where
Expand Down Expand Up @@ -374,14 +374,14 @@ pub trait Chunk {
//
// We could consider adding `type Incr` into `trait Chunk`, or only having `type Incr`
pub trait ToChunkIncr {
/// `Incr` provides the incrimental interface to this chunking instance
/// `Incr` provides the incremental interface to this chunking instance
type Incr: ChunkIncr;

/// `to_chunk_incr()` returns a [`ChunkIncr`] which can be incrimentally fed data and emits
/// `to_chunk_incr()` returns a [`ChunkIncr`] which can be incrementally fed data and emits
/// chunks.
///
/// Generally, this is a typically low cost operation that copies from the implimentor or does
/// minor computation on its fields and may allocate some memory for storing additional state
/// needed for incrimental computation.
/// needed for incremental computation.
fn to_chunk_incr(&self) -> Self::Incr;
}
4 changes: 2 additions & 2 deletions src/zpaq.rs
Original file line number Diff line number Diff line change
Expand Up @@ -168,9 +168,9 @@ impl Default for Zpaq {
}
}

/// Incrimental instance of [`Zpaq`].
/// incremental instance of [`Zpaq`].
///
/// `Zpaq` doesn't require input look back, so the incrimental and non-incrimental performance
/// `Zpaq` doesn't require input look back, so the incremental and non-incremental performance
/// should be similar.
#[derive(Debug)]
pub struct ZpaqIncr {
Expand Down
2 changes: 1 addition & 1 deletion src/zstd.rs
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ impl ZstdSearchState {
}
}

/// Incrimental chunking using Zstd's rsyncable algorithm
/// incremental chunking using Zstd's rsyncable algorithm
///
/// Performance note: Zstd's chunking requires buffer look back to remove previously inserted data,
/// and as a result requires `ZstdIncr` to maintain an internal buffer. This internal buffer may
Expand Down
4 changes: 2 additions & 2 deletions tests/cuts_qc.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// check the following are equivalent:
// - find_chunk_edge() with 1 set of buffer sizes vs another set of buffer sizes
// - incrimental with 1 set of buffer sizes vs another set of buffer sizes
// - find_chunk_edge() vs incrimental
// - incremental with 1 set of buffer sizes vs another set of buffer sizes
// - find_chunk_edge() vs incremental
//
// - simd vs non-simd algorithms

Expand Down