This algorithm is still being reviewed, therefore any feedback is appreciated. Some calculations and proofs might still need additional verification.
The ModRange Selection Algorithm is a new algorithm for generating unique, random values from a range continuous range of numbers. Its use case applies to when you want to select some element from a list of elements that can be indexed in a continuous range. It maintains constant O(1)
time complexity with O(n/2)
storage complexity for all operations and optimizes space usage by preventing unnecessary range fragmentation, an improvement to many existing established methodologies. This document presents the algorithm in detail, including data structures, operational procedures, correctness proof, and complexity analysis, as well as illustration through pseudocode.
The repository contains a Python implementation with benchmarks.
Generating unique random values from a known existing set of values with a known initial entropy is an easy task. The challenge is to also prevent the algorithm from selecting previously generated values unpredictability, and effectiveness.
The ModRange Selection Algorithm addresses these challenges by maintaining a dynamic set of ranges representing available values. It efficiently updates these ranges as values are generated, ensuring that each value is unique and that the generation process remains efficient.
Given a set of continuous numbers of size [0, n)
such that:
- Uniqueness: Each generated value is unique and not repeated.
- Randomness: Values are generated in a manner that is unpredictable (or dependent on entropy).
- Efficiency: Both generation and update operations are performed in constant time
O(1)
. - Tracking: The algorithm maintains a record of used and available values.
- Entropy Usage: The algorithm uses external entropy for selection.
The algorithm maintains a list of ranges, where each range represents a possible sequence of values.
Range: A tuple (start, size)
ranges: A list of Range objects
- Appending New Ranges: When splitting a range, new ranges are appended to the end of the list.
- Range Splitting: Upon splitting a range at index
i
:- The first part remains at index
i
. - The second part is appended to the end.
- The first part remains at index
- Range Order Irrelevance: The order of ranges in the list does not affect the correctness of the algorithm.
- Contiguous Representation: Each range accurately represents a contiguous block of available values.
Initialize the ranges
list with the full range of available values.
function initialize(size):
ranges = [(0, size)]
return ranges
To generate a unique random value:
function generate_value(entropy, ranges):
# Select a random range index based on entropy
range_index = entropy % length(ranges)
(start, size) = ranges[range_index]
# Generate a random value within the selected range
value = (entropy % size) + start
# Update the ranges to exclude the generated value
update_ranges(ranges, range_index, value)
return value
Adjust the ranges to remove the generated value while maintaining the invariants.
function update_ranges(ranges, range_index, value):
(start, size) = ranges[range_index]
value_offset = value - start
if value_offset == 0:
# Value is at the start of the range
if size == 1:
# The range is now empty; remove it using swap removal
swap_remove(ranges, range_index)
else:
# Shrink the range from the start
ranges[range_index] = (start + 1, size - 1)
elif value_offset == size - 1:
# Value is at the end of the range
ranges[range_index] = (start, size - 1)
else:
# Value is in the middle; split the range
left_size = value_offset
right_start = value + 1
right_size = size - value_offset - 1
ranges[range_index] = (start, left_size)
ranges.append((right_start, right_size))
Note: To maintain O(1)
time complexity during removals, the swap_remove
function is used when removing a range from the list.
function swap_remove(ranges, index):
ranges[index] = ranges[-1]
ranges.pop()
Initial State:
ranges = [(0, 1000)]
After generating/selecting value 473
:
- Using entropy, select range_index = 0
- Selected range: (0, 1000)
- value = (entropy % 1000) + 0 = 473
update_ranges(ranges, 0, 473):
(start, size) = (0, 1000)
value_offset = 473 - 0 = 473
Since value_offset is neither 0 nor size - 1 (999), we split the range:
- Left range: (0, 473)
- Right range: (474, 526) # right_size = 1000 - 473 - 1 = 526
- Update ranges:
ranges[0] = (0, 473)
ranges.append((474, 526))
Updated ranges:
ranges = [(0, 473), (474, 526)]
After generating value 100
(Middle of First Range):
- Suppose range_index = 0 (selected randomly)
- Selected range: (0, 473)
- value = (entropy % 473) + 0 = 100
update_ranges(ranges, 0, 100):
(start, size) = (0, 473)
value_offset = 100 - 0 = 100
Since value_offset is neither 0 nor size - 1 (472), we split the range:
- Left range: (0, 100)
- Right range: (101, 372)
- Update ranges:
ranges[0] = (0, 100)
ranges.append((101, 372))
Updated ranges:
ranges = [(0, 100), (474, 526), (101, 372)]
Note: The new range (101, 372)
is appended to the end of the list per the invariant.
Several algorithms address the problem of generating unique random identifiers from a fixed range. Notable among them are:
-
Fisher-Yates Shuffle Algorithm:
- Description: An algorithm to generate a random permutation of a finite set (shuffling an array). It works by swapping each element with a randomly selected element from the unshuffled portion.
-
Time Complexity:
O(n)
for shuffling the entire array. -
Comparison:
- Similarities: Ensures unique selection without repeats.
-
Differences: The full shuffle requires
O(n)
time upfront, whereas ModRange achievesO(1)
time per selection without pre-processing.
-
Reservoir Sampling:
-
Description: Used for sampling
$k$ elements from a large or unknown-size dataset. Each incoming element has a decreasing probability of being selected as the reservoir fills. -
Time Complexity:
O(n)
for$n$ elements. -
Comparison:
- Similarities: Generates random samples without replacement.
-
Differences: Not efficient for single selection in
O(1)
time and requires knowledge of total elements or passes through the dataset.
-
Description: Used for sampling
-
Alias Method for Sampling Discrete Distributions:
-
Description: Allows sampling from a discrete distribution in
O(1)
time afterO(n)
or$O(nlogn)$ preprocessing. -
Comparison:
- Similarities: Aims for constant-time sampling.
- Differences: ModRange does not require preprocessing and adapts dynamically as elements are selected.
-
Description: Allows sampling from a discrete distribution in
The ModRange Selection Algorithm distinguishes itself through several key features:
-
Efficiency: Achieves
O(1)
time complexity for both selection and update operations without a preprocessing requirement. -
Space Efficiency: Maintains ranges of available values, optimizing storage by avoiding the need to track each used value.
-
Scalability: Scales well with large
$n$ , as space complexity is proportional to the number of ranges rather than the number of elements. -
Deterministic Compatibility: Capable of functioning correctly with deterministic entropy sources, enhancing flexibility in environments where true randomness is constrained.
-
Fragmentation Control: Splits ranges only when necessary and appends new ranges to avoid disrupting existing ones.
-
Order Independence of Ranges: The correctness of the algorithm does not depend on the order of ranges in the list.
The ModRange Selection algorithm could potentially be a more efficient choice for above-mentioned scenarios due to its constant-time complexity.
- Invariant Maintenance: The update operations ensure that once a value is generated, it is removed from the available ranges.
- Range Integrity: Ranges always represent unselected values, and any selected ID is excluded from future selections.
- Conclusion: No value can be generated more than once.
- Random Selection: Both the range and the value within the range are selected based on entropy.
- Unpredictability: Without knowledge of the internal state (i.e., current ranges and entropy source), predicting the next value is infeasible.
- Deterministic Entropy: Even with a deterministic entropy source, the internal state changes with each operation, enhancing unpredictability to external observers.
- Appending New Ranges: When ranges are split, new ranges are appended, maintaining invariant 1.
- Range Updates: Modifications to ranges preserve their correctness and ensure contiguous representation.
- Order Irrelevance: The algorithm does not rely on the order of ranges, maintaining invariant 3.
All primary operations execute in constant time O(1)
:
- Range Selection: Calculated using a modulo operation on the entropy value.
- Value Generation: Uses modulo and arithmetic operations.
- Range Updates: Involves at most constant-time modifications to the ranges list.
- Range Removal: By using swap removal (
swap_remove
function), we ensure that removing a range from the list is done inO(1)
time, avoiding the linear time complexity of shifting elements.
: Total number of elements in the initial range
: Number of selected (used) values
: Number of ranges currently in the ranges list
: Sum of all values in all ranges (total unselected values)
-
Initial State:
-
Key Insights:
-
Maximum Number of Ranges:
-
- Each range should contain exactly one unselected value
- All selections should be made in the middle of ranges
This bound is tight because:
- Each range must contain at least one unselected value
(since each range must contain at least one value)
- To maximize
through splits:
- Any other strategy would either:
Therefore, , giving us a space complexity of
.
- Uniformity: The algorithm selects ranges uniformly without weighting by size, which could introduce a bias toward values in smaller ranges.
- Acceptability: If uniform randomness over all unselected values is not a strict requirement, and unpredictability suffices, this bias may be acceptable.
- Mitigation: For applications requiring uniform randomness, the algorithm could be extended by taking range size into account when selecting ranges.
- Deterministic Sources: The algorithm functions with deterministic entropy sources but care should be taken to protect the entropy source's state if security is a concern.
- Security: For cryptographic applications, using a cryptographically secure pseudo-random number generator (CSPRNG) is necessary.
- Optimized Updates: The algorithm reduces fragmentation by only splitting ranges when necessary and resizing when range boundaries are selected.
- Swap Removal for Efficiency: To maintain
O(1)
time complexity during range removals, theswap_remove
method should be used. This avoids the linear time complexity associated with removing elements from arbitrary positions in a list.
The ModRange Selection Algorithm is new and novel solution for when you want to generate unique random values from a known set of continuous numbers. It is useful in situations where resources are scarce and efficiency is critical such as high-performance computing and embedded systems. This is thanks to the time complexity of O(1)
. In addition to its compatibility of functioning with external entropy makes it an interesting alternative to established methods.