Skip to content

A library for storing and querying graph data in Clojure

License

Notifications You must be signed in to change notification settings

lilactown/pyramid

Repository files navigation

pyramid

A library for storing and querying graph data in Clojure.

Features:

  • Graph query engine that works on any in-memory data store
  • Algorithm for taking trees of data and storing them in normal form in Clojure data.

Install & docs

Clojars Project cljdoc badge

Why

Clojure is well known for its graph databases like datomic and datascript which implement a datalog-like query language. There are contexts where the power vs performance tradeoffs of these query languages don't make sense, which is where pyramid can shine.

Pyramid focuses on doing essential things like selecting data from and traversing relationships between entities, while eschewing arbitrary logic like what SQL and datalog provide. What it lacks in features it makes up for in read performance when combined with a data store that has fast in-memory look ups of entities by key, such as Clojure maps. It can also be extended to databases like Datomic, DataScript and Asami.

What

Pyramid can be useful at each evolutionary stage of a program where one needs to traverse and make selections out of graphs of data.

Selection

Pyramid starts by working with Clojure data structures. A simple program that uses pyramid can use a query to select specific data out of a large, deeply nested tree, like a supercharged select-keys.

(def data
  {:people [{:given-name "Bob" :surname "Smith" :age 29}
            {:given-name "Alice" :surname "Meyer" :age 43}]
   :items {}})

(def query [{:people [:given-name]}])

(pyramid.core/pull data query)
;; => {:people [{:given-name "Bob"} {:given-name "Alice"}]}

Transformation

Pyramid combines querying with the Visitor pattern in a powerful way, allowing one to easily perform transformations of selections of data. Simply annotate parts of your query with metadata {:visitor (fn visit [data selection] ,,,)} and the visit function will be used to transform the data in a depth-first, post-order traversal (just like clojure.walk/postwalk).

(def data
  {:people [{:given-name "Bob" :surname "Smith" :age 29}
            {:given-name "Alice" :surname "Meyer" :age 43}]
   :items {}})

(defn fullname
  [_db {:keys [given-name surname] :as person}]
  (str given-name " " surname))

(def query [{:people ^{:visitor fullname} [:given-name :surname]}])

(pyramid.core/pull data query)
;; => {:people ["Bob Smith" "Alice Meyer"]}

Accretion

A more complex program may need to keep track of that data over time, or query data that contains cycles, which can be done by creating a pyramid.core/db. "Pyramid dbs" are maps that have a particular structure:

;; for any entity identified by `[key id]`, it follows the shape:
{key {id {,,,}}

Adding data to a db will normalize the data into a flat structure allowing for easy updating of entities as new data is obtained and allow relationships that are hard to represent in trees. Queries can traverse the references inside this data.

See docs/GUIDE.md.

Durability

A program may grow to need durable storage and other features that more full featured in-memory databases provide. Pyramid provides a protocol, IPullable, which can be extended to allow queries to run over any store that data can be looked up by a tuple, [primary-key value]. This is generalizable to most databases like Datomic, DataScript, Asami and SQLite.

Full stack

The above shows the evolution of a single program, but many programs never grow beyond the accretion stage. Pyramid has been used primarily in user interfaces where data is stored in a data structure and queried over time to show different views on a large graph. Through its protocols, it can now be extended to be used with durable storage on the server as well.

Concepts

Query: A query is written using EQL, a query language implemented inside Clojure. It provides the ability to select data in a nested, recursive way, making it ideal for traversing graphs of data. It does not provide arbitrary logic like SQL or Datalog.

Entity map: a Clojure map which contains information that uniquely identifies the domain entity it is about. E.g. {:person/id 1234 :person/name "Bill" :person/age 67} could be uniquely identified by it's :person/id key. By default, any map which contains a key which (= "id" (name key)) is true, is an entity map and can be normalized using pyramid.core/db.

Ident function: a function that takes a map and returns a tuple [:key val] that uniquely identifies the entity the map describes.

Lookup ref: a 2-element Clojure vector that has a keyword and a value which together act as a pointer to a domain entity. E.g. [:person/id 1234]. pyramid.core/pull will attempt to look them up in the db value if they appear in the result at a location where the query expects to do a join.

Usage

See docs/GUIDE.md

Prior art

Copyright

Copyright © 2023 Will Acton. Distributed under the EPL 2.0.