The purpose of this document is to describe the internal code structure and major algorithms used by DAOS. It assumes prior knowledge of the DAOS storage model and acronyms. This document contains the following sections:
A DAOS system is a multi-server installation with several components.
- DAOS Server (control plane): Go (privileged) process managed by systemd (or other). daos_server instance running on single storage server. Can spawn multiple daos_io_servers (data plane) but mostly opaque to user.
- DAOS I/O Server (data plane): The DAOS server is a persistent service running on all the storage nodes. It manages incoming requests for all targets hosted on the storage node and provides a stackable modular interface to load server-side modules on demand. Initially, only certified server-side modules can be loaded to address security concerns. In the future, untrusted modules can run in a sandbox managed by control groups (i.e. cgroup), allowing a fine-grained control over the resources (memory, CPU cycles, etc.) consumed by the module. A server module can register handlers for processing RPCs issued by a counterpart client library or other instance of the same module on different servers. The DAOS server provides service threads with core and NUMA affinity to execute those RPC handlers. Service threads rely on event-based processing, which means that each service thread can manage multiple concurrent requests simultaneously. Each module may call into the external API of the layer below to eventually access the local versioning object store library in charge of data persistency.
DAOS control server (control plane) DAOS I/O server (data plane)
- DAOS Client: Resides on client node. Application that links to libdaos and talks to DAOS Agent. DAOS provides a collection of client libraries (one per-layer) that implement the external API exported to application developers. The DAOS client libraries as well as the networking parts do not spawn any internal threads and can be linked directly with the application. It supports the event and event queue interface for non-blocking operations. During execution, a client library can either call into the client library of the lower layer, or use the network transport to send a RPC to its server counterpart, as show in Figure 4 2.
DAOS Client and I/O Middleware
- DAOS Agent: daos_agent is a daemon process residing on the client node. Trusted intermediary in authentication of DAOS Client application. Signs DAOS Client credentials using local certificates
- DAOS Management Tool: Uses control plane client API to administrate the daos_server instances. e.g. daos_shell executable
DAOS Management tool (aka dmg)
Network Transport with CaRT & gRPC. Node Addressing:
- Nodes within a Mercury process group are addressed by endpoints that are assigned for the lifetime of the session. DAOS, on the other hand, must store persistent node identifiers in the pool map. Storage targets will therefore need to convert the persistent node identifiers into runtime addresses when transferring the pool map to other targets and clients. RPCs will then be sent through Mercury by specifying the endpoint of the storage node inside the process group. Likewise, any updates to the pool map will require converting the Mercury endpoint into a persistent node identifier.
- gRPC
The DAOS stack is organized as a set of micro-services over a client/server architecture. The figure below shows the logical layering of the DAOS stack.
- Versioning Object Store
- Blob I/O
- Service Replication
- Pool Service
- Container Service
- Key Array Object
- Algorithmic Object Placement
- Self-healing
- Security
- DAOS Client Library
- POSIX File & Direcotory Emulation
- Common Library
Interoperability in DAOS is handled via protocol and schema versioning for PM data structures.
Limited protocol interoperability is to be provided by the DAOS storage stack. Version compatibility checks will be performed to verify that:
- All targets in the same pool run the same protocol version.
- Client libraries linked with the application may be up to one protocol version older than the targets.
If a protocol version mismatch is detected among storage targets in the same pool, the entire pool will fail to start up and will report failure to the control API. Similarly, connection from clients running a protocol version incompatible with the targets will return an error.
The schema of persistent data structures might evolve from time to time to fix bugs, add new optimizations or support new features. To that end, VOS supports schema versioning.
Upgrading the schema version is not done automatically and must be initiated by the administrator. A dedicated upgrade tool will be provided to upgrade the schema version to the latest one. All targets in the same pool must have the same schema version. Version checks are performed at pool initialization time to enforce this constraint.
To limit the validation matrix, each new DAOS release will be published with a list of sup-ported schema versions. To run with the new DAOS release, administrators will then need to upgrade the pools to one of the supported schema version. New target will always be reformatted with the latest version. This versioning schema only applies to data structure stored in persistent memory and not to block storage that only stores data buffers with no metadata.