Skip to content

DIRAC Metadata and Replica FileCatalog

graciani edited this page Dec 2, 2011 · 3 revisions

DIRAC File Catalog (Work in Progress)

DIRAC provides an integrated Metadata and Replica File Catalog service. The design of the DIRAC File Catalog (DFC) has been optimized for navigation both in the filesystem and metadata space as well as retrieval of replicas associated to a given entry.

In order to achieve this optimization several design choices have been taken:

  • LFN is primary identifier of a File. Each File can have an associated GUID, but the catalog is not responsible to check its unicity. In order to guarantee the usability of GUID for LFN to LFN navigation, each LFN can have any number of associated ancestors.
  • The replica information is build on the fly from the LFN and the name of the StorageElement hosting the replica. This implies that PFNs are build like [protocol]://[server][:[port]][serviceURL]/LFN. The server, port and serviceURL are kept in DIRAC Configuration for each known StorageElement and supported access protocol.
  • Any arbitrary number of metadata tags can be defined to describe your data. For each defined tags, a value of the proper type can be associated to each catalog entry (file or directory). The metadata associated to a given LFN is the logical combination of the tag,value pairs associated to the file and all its parent directories.
  • Additionally, the administrator of the FileCatalog can decide which of these tags can be used to build metadata queries.

DFC does not make any assumption on the way you organize your file system or your metadata. The decision of which metadata tags are associated to files and which to directories, as well as which tags are optimized for queries, is left to the administrator of the FileCatalog and the tools create by the community to register its metadata.

The ultimate performance when querying the catalog will be achieved when a proper design of your file system and metadata schema is done.

Clone this wiki locally