Author: Varun (@ibesuperv)
The goal of this system is to provide a fully decentralized, content-addressable storage layer that can handle massive binary objects with consistent
In centralized storage systems (e.g., S3), the bottleneck is the metadata index and the central authority. In typical P2P systems, "brittle" data discovery and memory-heavy buffering are common pitfalls.
This project was built to demonstrate that decentralized systems can be both secure (AES-CTR) and efficient (streaming I/O) by leaning into the strengths of the Go programming language's concurrency model.
- Transporter (Interface): Abstraction for the P2P wire protocol. Currently implemented via TCP.
- Orchestrator (FileServer): The brain of the node. Coordinates message routing and data streaming.
- CAS Storage Engine: A sharded, hash-based persistence layer.
- Crypto Pipe: A streaming encryption/decryption middleware.
We implement a Content-Addressable Storage (CAS) model.
- Addressing: Address =
Signature(Content). - Constraint: If content changes, the address MUST change.
- Benefit: This allows for stateless discovery. You don't need to know where a file is; you just broadcast its hash, and any node possessing it can provide it.
Memory safety is achieved through Transforming Readers.
- We avoid
ioutil.ReadAllat all costs. - Workflow:
Network Stream->IV Reader->AES-CTR Decrypt Pipe->CAS Writer->Disk. - Latency: Sub-millisecond overhead for the crypto transform due to CTR mode's parallelizable nature.
We use a Reactive Broadcast model for file discovery.
- Wait Context: The system uses buffered channels and wait-loops (with future potential for DHT integration) to synchronize asynchronous peer responses.
The system implements Uniform Cryptography. Whether a file is sitting on a node's disk or flying across the internet, it is always in its encrypted form.
Since we use SHA1 for CAS, we have inherent tamper-evidence. If a peer sends corrupted data, the resulting hash will mismatch the expected key, and the system will automatically reject the download.
-
DHT Integration: Replacing broadcasts with a Kademlia-style Distributed Hash Table for
$O(\log n)$ discovery. - Erasure Coding: Implementing Reed-Solomon codes to allow data recovery even if multiple nodes go offline simultaneously.
- TLS/mTLS: Upgrading the raw TCP transport to use mutual TLS for node identity verification.
- Concurrent Uploads: Tested with 3+ nodes in parallel.
- Large File Handling: Successfully processed 1GB+ files with < 50MB RAM usage.