SXURL (pronounced "Sixerl") is a fixed-length, sliceable URL identifier system designed for efficient database storage and querying. It converts URLs into deterministic 256-bit identifiers where each URL component occupies a fixed position, enabling fast substring-based filtering and indexing.
- 🔒 Fixed-length: All SXURL identifiers are exactly 256 bits (64 hex characters)
- 📏 Sliceable: Each URL component has a fixed position for substring filtering
- 🔄 Deterministic: Same input always produces the same output
- 🛡️ Collision-resistant: Uses SHA-256 hashing for component fingerprinting
- 🌐 Standards-compliant: Supports IDNA, Public Suffix List, and standard URL schemes
- ⚡ Zero-copy: Efficient parsing and encoding with minimal allocations
- 🧪 Thoroughly tested: 100+ comprehensive tests covering edge cases
Add this to your Cargo.toml
:
[dependencies]
sxurl = "0.1"
use sxurl::{encode_url_to_hex, decode_hex, matches_component};
// Encode a URL to SXURL
let sxurl_hex = encode_url_to_hex("https://docs.rs/serde")?;
println!("SXURL: {}", sxurl_hex); // 64 hex characters
// Decode and inspect components
let decoded = decode_hex(&sxurl_hex)?;
println!("Scheme: {}", decoded.header.scheme);
println!("Has subdomain: {}", decoded.header.subdomain_present);
// Fast component matching for filtering
let is_docs_rs = matches_component(&sxurl_hex, "domain", "docs")?;
assert!(is_docs_rs);
let is_rs_tld = matches_component(&sxurl_hex, "tld", "rs")?;
assert!(is_rs_tld);
use sxurl::{split_url, parse_query, get_anchor, strip_anchor, join_url_path};
// Parse URL into all components at once
let parts = split_url("https://api.github.com/repos?page=1#readme")?;
println!("Domain: {}", parts.domain); // "github"
println!("Subdomain: {:?}", parts.subdomain); // Some("api")
println!("Anchor: {:?}", parts.anchor); // Some("readme")
// Work with query parameters
let params = parse_query("https://example.com?foo=bar&page=2")?;
println!("Page: {:?}", params.get("page")); // Some("2")
// Handle anchors (fragments)
let anchor = get_anchor("https://docs.rs#search")?; // Some("search")
let clean_url = strip_anchor("https://example.com#top")?; // "https://example.com"
// Join URLs properly
let api_url = join_url_path("https://api.example.com/v1", "users")?;
// Result: "https://api.example.com/v1/users"
use sxurl::encode_url_to_hex;
// Store URLs efficiently in your database
let urls = vec![
"https://docs.rs/serde",
"https://crates.io/crates/tokio",
"https://github.com/rust-lang/rust",
];
for url in urls {
let sxurl = encode_url_to_hex(url)?;
// Store sxurl as CHAR(64) or BINARY(32) in your database
println!("INSERT INTO urls (sxurl, original_url) VALUES ('{}', '{}')", sxurl, url);
}
// Query by domain efficiently using substring matching
// SELECT * FROM urls WHERE sxurl LIKE '_______________example_______________'
// ^domain slice position (chars 7-22)
The 256-bit SXURL has this fixed layout:
Component | Hex Range | Bits | Description |
---|---|---|---|
header | [0..3) | 12 | Version, scheme, flags |
tld_hash | [3..7) | 16 | Top-level domain hash |
domain | [7..22) | 60 | Domain name hash |
subdomain | [22..30) | 32 | Subdomain hash |
port | [30..34) | 16 | Port number |
path | [34..49) | 60 | Path hash |
params | [49..58) | 36 | Query parameters hash |
fragment | [58..64) | 24 | Fragment hash |
- Version (4 bits): Currently always
1
- Scheme (3 bits):
0
=https,1
=http,2
=ftp - Flags (5 bits): Component presence indicators
- Bit 4: Subdomain present
- Bit 3: Query parameters present
- Bit 2: Fragment present
- Bit 1: Non-default port present
- Bit 0: Reserved (always 0)
use sxurl::SxurlEncoder;
let encoder = SxurlEncoder::new();
// Encode to bytes
let sxurl_bytes = encoder.encode("https://example.com")?;
assert_eq!(sxurl_bytes.len(), 32); // Always 32 bytes
// Encode to hex string
let sxurl_hex = encoder.encode_to_hex("https://example.com")?;
assert_eq!(sxurl_hex.len(), 64); // Always 64 hex chars
use sxurl::{encode_url_to_hex, matches_component};
let urls = vec![
"https://api.github.com/repos/rust-lang/rust",
"https://docs.github.com/en/rest",
"https://github.blog/2023-01-01/announcement",
];
// Find all GitHub URLs
for url in &urls {
let sxurl = encode_url_to_hex(url)?;
if matches_component(&sxurl, "domain", "github")? {
println!("GitHub URL: {}", url);
}
}
// Find all API endpoints
for url in &urls {
let sxurl = encode_url_to_hex(url)?;
if matches_component(&sxurl, "subdomain", "api")? {
println!("API URL: {}", url);
}
}
use sxurl::ComponentHasher;
// Access individual hash functions
let tld_hash = ComponentHasher::hash_tld("com")?;
let domain_hash = ComponentHasher::hash_domain("example")?;
let path_hash = ComponentHasher::hash_path("/api/v1/users")?;
println!("TLD hash: 0x{:04x}", tld_hash);
println!("Domain hash: 0x{:015x}", domain_hash);
println!("Path hash: 0x{:015x}", path_hash);
https
(scheme code 0) - Default port 443http
(scheme code 1) - Default port 80ftp
(scheme code 2) - Default port 21
Store URLs as fixed-length identifiers with efficient B-tree indexing:
CREATE TABLE url_index (
sxurl CHAR(64) PRIMARY KEY,
original_url TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Index by domain (characters 7-22)
CREATE INDEX idx_domain ON url_index (SUBSTRING(sxurl, 8, 15));
-- Index by TLD (characters 3-7)
CREATE INDEX idx_tld ON url_index (SUBSTRING(sxurl, 4, 4));
Quickly identify duplicate URLs across different formats:
use sxurl::encode_url_to_hex;
use std::collections::HashSet;
let mut seen_urls = HashSet::new();
let urls = vec![
"https://Example.Com/Path",
"https://example.com/path", // Same after normalization
"https://example.com/path/", // Different path
];
for url in urls {
let sxurl = encode_url_to_hex(url)?;
if seen_urls.insert(sxurl) {
println!("New URL: {}", url);
} else {
println!("Duplicate URL: {}", url);
}
}
Filter large URL datasets by domain without parsing:
// Filter millions of URLs by domain using simple string operations
let domain_filter = "1a2b3c4d5e6f7890abc"; // Hash of target domain
let urls_in_domain: Vec<_> = all_sxurls
.iter()
.filter(|sxurl| &sxurl[7..22] == domain_filter)
.collect();
All functions return Result<T, SxurlError>
. Common error cases:
use sxurl::{encode_url_to_hex, SxurlError};
match encode_url_to_hex("invalid-url") {
Ok(sxurl) => println!("Success: {}", sxurl),
Err(SxurlError::InvalidScheme) => println!("Unsupported URL scheme"),
Err(SxurlError::HostNotDns) => println!("Host must be a DNS name, not IP"),
Err(SxurlError::InvalidLength) => println!("Invalid SXURL format"),
Err(e) => println!("Other error: {}", e),
}
SXURL is designed for high-performance applications:
- Encoding: ~1-5 μs per URL (depending on complexity)
- Decoding: ~100-500 ns per SXURL
- Component matching: ~50-100 ns per comparison
- Memory usage: Fixed 32 bytes per SXURL, minimal temporary allocations
Benchmarks on a modern CPU:
encode_simple_url time: 1.2 μs
encode_complex_url time: 4.8 μs
decode_sxurl time: 245 ns
component_match time: 67 ns
SXURL uses labeled SHA-256 hashing: H_n(label, data) = lower_n(SHA256(label || 0x00 || data))
- Collision resistance: ~2^(n/2) where n is the bit width
- Domain separation: Different labels produce different hashes for same data
- Deterministic: Same input always produces same output
- Scheme and host converted to lowercase
- IDNA (Internationalized Domain Names) support
- Public Suffix List (PSL) for proper domain/TLD splitting
- IP addresses rejected (DNS names only)
- Empty components stored as zero (not hashed)
- Default ports (80, 443, 21) stored explicitly
- Query parameters and fragments preserved as-is
- Path normalization preserves original structure
Run the comprehensive test suite:
cargo test # Run all tests
cargo test --doc # Test documentation examples
cargo test --release # Test optimized builds
cargo bench # Run performance benchmarks
The library includes 100+ tests covering:
- Specification compliance
- Edge cases and error conditions
- Round-trip consistency
- Hash collision resistance
- Performance regression testing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
SXURL follows a formal specification available in SXURL-SPEC.md. The implementation is designed to be compatible with other SXURL implementations in different languages.
See CHANGELOG.md for detailed release history.
Licensed under the Apache License, Version 2.0 (LICENSE or http://www.apache.org/licenses/LICENSE-2.0).
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be licensed under the Apache License, Version 2.0, without any additional terms or conditions.
SXURL - Efficient, deterministic URL identifiers for modern applications.