Skip to content

The Database

Christopher Dumas edited this page Apr 21, 2019 · 3 revisions

The pivoted database view

Atomic Database's core is an Entity Attribute Value database. This is a database where every piece of knowledge is represented as a property on a specific entity. This is different than a relational database (the most common type of database in the industry at large), where knowledge is represented as a table, where each row has a specific unique identifier, and each column is a value for a specific property. An example of an EAV is:

Entity Attribute Value
70 1 "John Doe"

Compare this with data from a relational database, which would look like this:

Id Name
70 "John Doe"

As you'll notice, in the EAV example the entity and attribute are not plain-text symbols. Instead, they're numbers. Atomic Database represents entities and attributes this way for ease and speed of lookup when all you want is the value. When doing more complex operations where you don't provide either the entity or the attribute (or both), it consults a table of strings to find out what the textual representation of the entities and attributes in the database are when unifying your predicates (questions about the database) with the database, since your predicates use the textual representations.

This leads to a lot of string comparisons when you're asking questions like "who has an age of 56?" versus normal questions like "what is the age of entity_foo?". This is obviously pretty slow, and much slower than if only integer comparisons were made. Luckily, there is a way to turn almost all database lookups that concern entities and attributes into integer comparisons, at a slight cost to initial compilation speed. Currently, when the code is "compiled" before it is turned into the AST, quoted strings are separated out into "entities" (unrelated to database entities). This is currently done so that strings are not confused with symbols, and strings with capital letters aren't parsed as variables. Later on, this might also allow me to convert textual attributes and database entities (which are already quoted usually) in your code to their corresponding integers, so that when solving logic problems only integer comparisons are made. At the moment, however, I'm not very concerned about performance.

The raw database view

Why did I choose EAVs for Atomic Database? There are a couple reasons. Traditionally, the pros and cons of EAV databases looked something like this:

  • Pro: less time to design and develop a simple application
  • Pro: new entities easy to add, even by users
  • Pro: "generic" interface components
  • Pro: more flexible data model and representation
  • Con: complex code required to validate simple data types
  • Con: much more complex SQL for simple reports
  • Con: complex reports can become almost impossible
  • Con: poor performance for a large data set

While the pros and cons of relational databases looked something like:

  • Con: more time required to gather requirements and design
  • Con: new entities must be modeled and designed by a professional
  • Con: custom interface components for each entity
  • Pro: data type constraints and validation simple to implement
  • Pro: SQL is easy to write, easy to understand and debug
  • Pro: even the most complex reports are relatively simple
  • Pro: best performance for large data sets

For Atomic Database, it is highly important that the data model be flexible, as users shouldn't be expected to know and thing through their entire data model ahead of time-- that's a very high bar to entry and a very stringent limitation for something that's supposed to put a user behind the wheel. That already would make a relational database very difficult to design and manage. Atomic Database also isn't using a standard relational-oriented query language (like SQL), meaning that SQL's difficulty in dealing with EAV databases is a moot point, and logic programming languages are incredibly good at creating and dealing with complex reports (although, there are several missing features in this area which I plan to improve on, such as formatting and exporting full reports from a rule, etc). As far as validating data types for attributes, I don't really see that as an issue since attributes generally have an obvious data type, and the entire GUI of Atomic Database is very obviously typed, showing completely different control elements based on the type of a value, meaning it would be difficult to mix up what type something is or should be. At this point, it should be obvious that the only major trade-off Atomic Database made in choosing an EAV-database approach was performance. Unfortunately, there's not a lot that can be done for performance besides rewriting in a more performant language. This is why I've created Atomic Database so that the natural language module, the logic language and database module, and the UI are all highly decoupled, presenting a fairly small (1 or 2 function) interface to the outside world. In fact, I could also easily decouple the database from the logic programming language if I needed to, so rewriting the database in, say, Rust, for faster lookup times would indeed be feasible.

To read a code example, there are three things you need to know:

  1. Anything between angle brackets (< and >) is a placeholder. A placeholder means you can put a valid value in its place. Generally, the words between the angle brackets tell you what sort of thing goes there.
  2. Numbers between angle brackets mean that the thing the bracketed item stands in for is the nth argument. So <1> means the 1st argument, and so on.
  3. Ellipses (...) mean that you can put any number of things there. Sometimes, these are left out where the text explicitly says that multiple things can be put there.
Clone this wiki locally