Skip to content

The Natural Language Query System

Christopher Dumas edited this page Apr 21, 2019 · 6 revisions

Natural Language Query System

As part of the efforts of Atomic Database to create a user-friendly environment with a short learning curve, we have an NLP-based system for describing simple rules and queries.

an example query

Forms

At the moment (and this page will be updated as things change), there are four different query formats. Keep in mind that there is a lot more leeway than the examples imply for how you phrase things, including using alternate forms or words or plurals, and adding extra words around them if necessary to make it clearer. This flexibility will be increased with time.

  • Predicate: A basic predicate, not in the order of an EAV, but in the order that seems most natural for English:

    The <attribute> of <entity> is <value>.

  • ReversePredicate: A reverse predicate is just what it sounds like, the reverse of the Predicate form. has the form:

    <value> is the <attribute> of <entity>.

  • PredicateContraction: A predicate contraction is similar in form to a predicate, but with a contraction:

    <entity>'s <attribute> is <value>.

  • SimpleQuery: This is for making simple database lookups, which don't even require any unification:

    What is <entity>'s <attribute>.

  • ReverseSimpleQuery: This is for the same purpose as SimpleQuery, but with an alternate word order:

    What is the <attribute> of <entity>.

  • FindEntitySimpleQuery: This is for finding an entity with a given property. It has the form:

    What has a <attribute> of <value>

  • FindEntitySimpleQueryContraction: An alternate word choice for FindEntitySimpleQuery:

    What's <attribute> is <value>.

Note that for variations of SimpleQuery, the result (either entity, attribute, or value) is bound to the variable Result. Also, note that contractions are supported anywhere they make sense, and stop words such as a and the are allowed anywhere, as well as words like if and unless; keep in mind also that what/who/when and other "WP" words are interchangeable. Over time, I plan to increase the depth and breadth of the forms recognized, and when I do this page will be updated.

A Note on NLP

This might seem like I'm manually encoding the word orders, at which point I may as well split by spaces and choose the nth word. However, this really isn't true. All I really have to do is set the general outline of what the sentence has to look like, and a huge variety of word choices, contractions, word amounts, and so on, are condensed into the central pattern. It just so happens that the types of things Atomic Database natural language queries talk about don't allow a huge amount of word-choice, although that may change in the future as features from the S-Expression language migrate over.

Variables

Variables are denoted with a CamelCase or snake_case (preferably the former) word, whose first letter must be capitalized.

Valid Not Valid
X x
ThisIsATest thisIsATest
Known_result known_Result
Known_Result Known-Result

You can substitute any of the items in angle brackets with a variable, although some substitutions may give less-than-useful results. Keep in mind that variables are bound once, and cannot be modified or reset, so if you use the variable after its been set to (unified) a value, it's going to try to unify the new value with the old one. So make sure (if you're using the Query Box) to Clear the variable bindings (you should see the table of bindings disappear when you do this) if you want to reuse a variable. Inside a rule, just don't use the same variable name if you want to give it a different value. Use an alternate one.

Entity IDs

Entity IDs can contain spaces and special characters, so it is recommended to surround them with double quotes (") to tell the NLP interpreter to not worry about what's inside of them. Thus, entity IDs follow the same rules as regular string values.

Attributes and Rules

Attributes cannot contain spaces, and it is recommended to use lowercase snake_case when naming them (all UI interfaces where you can create attributes try to automatically do this formatting for you if you do it wrong). This helps the NLP parser to figure out what's going on, and also makes it so other parsers, such as the one for the S-Expression language (we'll get to that in another page), can figure out what's going on.

If Atomic Database can't find an attribute that satisfies what you're asking for (either can't find a name or a value), it will try to find a rule that can satisfy these constraints. Unlike attributes, which can only take a total of two arguments (the entity, and the value (or output variable)), rules can take any number of arguments, in theory. However, ML queries treat rules exactly like attributes, because due to the structure of human sentences (subject, verb, object) you can only easily pass in two arguments to anything. So, you can't use rules with more than 2 arguments in NL queries!

Conjugations

Conjugations are supported in the natural query language, so and and or can join various requests and predicates. Parenthesis for grouping are also supported, and then is treated like and because semantically, it does the same thing. The system that does this converts it into a prefix notation setup, and its a pretty complicated algorithm. I would say try to avoid extremely complex queries with conjugations, it might lead to unexpected results.

Examples

What is "[email protected]"'s name?
Who is "[email protected]"'s father?
The father of "[email protected]" is X.
"[email protected]"'s father is X
X is the father of "[email protected]"

An example rule (which could be run in the Query Box if P was unified with an entity ID):

If X is the father of P, X's father is Goal.

or

Who is the father of P?
Result's father is Goal.

or

Who is the father of P?
Goal is the father of Result.

And so on.

To read a code example, there are three things you need to know:

  1. Anything between angle brackets (< and >) is a placeholder. A placeholder means you can put a valid value in its place. Generally, the words between the angle brackets tell you what sort of thing goes there.
  2. Numbers between angle brackets mean that the thing the bracketed item stands in for is the nth argument. So <1> means the 1st argument, and so on.
  3. Ellipses (...) mean that you can put any number of things there. Sometimes, these are left out where the text explicitly says that multiple things can be put there.
Clone this wiki locally