Author(s):
Abstract
This spec defines an IPFS Repo, its contents, and its interface. It does not specify how the repo data is actually stored, as that is done via swappable implementations.
TODO
A repo
is the storage repository of an IPFS node. It is the subsystem that
actually stores the data IPFS nodes use. All IPFS objects are stored
in a repo (similar to git).
There are many possible repo implementations, depending on the storage media used. Most commonly, IPFS nodes use an fs-repo.
Repo Implementations:
- fs-repo - stored in the os filesystem
- mem-repo - stored in process memory
- s3-repo - stored in amazon s3
The Repo stores a collection of IPLD objects that represent:
- config - node configuration and settings
- datastore - content stored locally, and indexing data
- keystore - cryptographic keys, including node's identity
- hooks - scripts to run at predefined times (not yet implemented)
Note that the IPLD objects a repo stores are divided into:
- state (system, control plane) used for the node's internal state
- content (userland, data plane) which represent the user's cached and pinned data.
Additionally, the repo state must determine the following. These need not be IPLD objects, though it is of course encouraged:
- version - the repo version, required for safe migrations
- locks - process semaphores for correct concurrent access
- datastore_spec - array of mounting points and their properties
Finally, the repo also stores the blocks with blobs containing binary data.
Repo implementations may change over time, thus they MUST include a version
recognizable across versions. Meaning that a tool MUST be able to read the version
of a given repo type.
For example, the fs-repo
simply includes a version
file with the version number. This way, the repo contents can evolve over time but the version remains readable the same way across versions.
IPFS nodes store some IPLD objects locally. These are either (a) state objects required for local operation -- such as the config
and keys
-- or (b) content objects used to represent data locally available. Content objects are either pinned (stored until they are unpinned) or cached (stored until the next repo garbage collection).
The name "datastore" comes from go-datastore, a library for swappable key-value stores. Like its name-sake, some repo implementations feature swappable datastores, for example:
- an fs-repo with a leveldb datastore
- an fs-repo with a boltdb datastore
- an fs-repo with a union fs and leveldb datastore
- an fs-repo with an s3 datastore
- an s3-repo with a cached fs and s3 datastore
This makes it easy to change properties or performance characteristics of a repo without an entirely new implementation.
A Repo typically holds the keys a node has access to, for signing and for encryption.
Details on operation and storage of the keystore can be found in REPO_FS.md
and KEYSTORE.md
.
The node's config
(configuration) is a tree of variables, used to configure various aspects of operation. For example:
- the set of bootstrap peers IPFS uses to connect to the network
- the Swarm, API, and Gateway network listen addresses
- the Datastore configuration regarding the construction and operation of the on-disk storage system.
There is a set of properties, which are mandatory for the repo usage. Those are Addresses
, Discovery
, Bootstrap
, Identity
, Datastore
and Keychain
.
It is recommended that config
files avoid identifying information, so that they may be re-shared across multiple nodes.
CHANGES: today, implementations like js-ipfs and go-ipfs store the peer-id and private key directly in the config. These will be removed and moved out.
IPFS implementations may use multiple processes, or may disallow multiple processes from using the same repo simultaneously. Others may disallow using the same repo but may allow sharing datastores simultaneously. This synchronization is accomplished via locks.
All repos contain the following standard locks:
repo.lock
- prevents concurrent access to the repo. Must be held to read or write.
This file is created according to the Datastore configuration specified in the config
file. It contains an array with all the mounting points that the repo is using, as well as its properties. This way, the datastore_spec
file must have the same mounting points as defined in the Datastore configuration.
It is important pointing out that the Datastore
in config must have a Spec
property, which defines the structure of the ipfs datastore. It is a composable structure, where each datastore is represented by a json object.
Like git, IPFS nodes will allow hooks
, a set of user configurable scripts to run at predefined moments in IPFS operations. This makes it easy to customize the behavior of IPFS nodes without changing the implementations themselves.
A repository uniquely identifies a node. Running two different IPFS programs with identical repositories -- and thus identical identities -- WILL cause problems.
Datastores MAY be shared -- with proper synchronization -- though note that sharing datastore access MAY erode privacy.
DO NOT BREAK USERS' DATA. This is critical. Thus, any changes to a repo's implementation MUST be accompanied by a SAFE migration tool.
See ipfs/kubo#537 and jbenet/random-ideas#33
A repo version is a single incrementing integer. All versions are considered non-compatible. Repos of different versions MUST be run through the appropriate migration tools before use.