-
Notifications
You must be signed in to change notification settings - Fork 5
The Natural Language Query System
As part of the efforts of Atomic Database to create a user-friendly environment with a short learning curve, we have an NLP-based system for describing simple rules and queries.
At the moment (and this page will be updated as things change), there are four different query formats. Keep in mind that there is a lot more leeway than the examples imply for how you phrase things, including using alternate forms or words or plurals, and adding extra words around them if necessary to make it clearer. This flexibility will be increased with time.
-
Predicate: A basic predicate, not in the order of an EAV, but in the order that seems most natural for English:
The <attribute> of <entity> is <value>
. -
ReversePredicate: A reverse predicate is just what it sounds like, the reverse of the Predicate form. has the form:
<value> is the <attribute> of <entity>
. -
PredicateContraction: A predicate contraction is similar in form to a predicate, but with a contraction:
<entity>'s <attribute> is <value>
. -
SimpleQuery: This is for making simple database lookups, which don't even require any unification:
What is <entity>'s <attribute>
. -
ReverseSimpleQuery: This is for the same purpose as SimpleQuery, but with an alternate word order:
What is the <attribute> of <entity>
. -
FindEntitySimpleQuery: This is for finding an entity with a given property. It has the form:
What has a <attribute> of <value>
-
FindEntitySimpleQueryContraction: An alternate word choice for FindEntitySimpleQuery:
What's <attribute> is <value>
.
Note that for variations of SimpleQuery, the result (either entity, attribute, or value) is bound to the variable Result
. Also, note that contractions are supported anywhere they make sense, and stop words such as a
and the
are allowed anywhere, as well as words like if
and unless
; keep in mind also that what/who/when and other "WP" words are interchangeable. Over time, I plan to increase the depth and breadth of the forms recognized, and when I do this page will be updated.
This might seem like I'm manually encoding the word orders, at which point I may as well split by spaces and choose the nth word. However, this really isn't true. All I really have to do is set the general outline of what the sentence has to look like, and a huge variety of word choices, contractions, word amounts, and so on, are condensed into the central pattern. It just so happens that the types of things Atomic Database natural language queries talk about don't allow a huge amount of word-choice, although that may change in the future as features from the S-Expression language migrate over.
Variables are denoted with a CamelCase or snake_case (preferably the former) word, whose first letter must be capitalized.
Valid | Not Valid |
---|---|
X | x |
ThisIsATest | thisIsATest |
Known_result | known_Result |
Known_Result | Known-Result |
You can substitute any of the items in angle brackets with a variable, although some substitutions may give less-than-useful results. Keep in mind that variables are bound once, and cannot be modified or reset, so if you use the variable after its been set to (unified) a value, it's going to try to unify the new value with the old one. So make sure (if you're using the Query Box) to Clear the variable bindings (you should see the table of bindings disappear when you do this) if you want to reuse a variable. Inside a rule, just don't use the same variable name if you want to give it a different value. Use an alternate one.
Entity IDs can contain spaces and special characters, so it is recommended to surround them with double quotes ("
) to tell the NLP interpreter to not worry about what's inside of them. Thus, entity IDs follow the same rules as regular string values.
Attributes cannot contain spaces, and it is recommended to use lowercase snake_case when naming them (all UI interfaces where you can create attributes try to automatically do this formatting for you if you do it wrong). This helps the NLP parser to figure out what's going on, and also makes it so other parsers, such as the one for the S-Expression language (we'll get to that in another page), can figure out what's going on.
If Atomic Database can't find an attribute that satisfies what you're asking for (either can't find a name or a value), it will try to find a rule that can satisfy these constraints. Unlike attributes, which can only take a total of two arguments (the entity, and the value (or output variable)), rules can take any number of arguments, in theory. However, ML queries treat rules exactly like attributes, because due to the structure of human sentences (subject, verb, object) you can only easily pass in two arguments to anything. So, you can't use rules with more than 2 arguments in NL queries!
Conjugations are supported in the natural query language, so and
and or
can join various requests and predicates. Parenthesis for grouping are also supported, and then
is treated like and
because semantically, it does the same thing. The system that does this converts it into a prefix notation setup, and its a pretty complicated algorithm. I would say try to avoid extremely complex queries with conjugations, it might lead to unexpected results.
What is "[email protected]"'s name?
Who is "[email protected]"'s father?
The father of "[email protected]" is X.
"[email protected]"'s father is X
X is the father of "[email protected]"
An example rule (which could be run in the Query Box if P
was unified with an entity ID):
If X is the father of P, X's father is Goal.
or
Who is the father of P?
Result's father is Goal.
or
Who is the father of P?
Goal is the father of Result.
And so on.
A note on quoting. Strings are quoted with double quotes, like in Python or JavaScript. However, as in the Natural Language Query System, entity names should also be quoted! Although entity names are not necessarily strings, per se, they should be quoted to help the interpreter recognize them correctly. For entity names that would be a valid attribute or rule name, going without quotes should be fine. But for entity names that include special characters (beyond underscores), or entity names that would look like a variable name or include spaces please use quotes.
To read a code example, there are three things you need to know:
- Anything between angle brackets (
<
and>
) is a placeholder. A placeholder means you can put a valid value in its place. Generally, the words between the angle brackets tell you what sort of thing goes there. - Numbers between angle brackets mean that the thing the bracketed item stands in for is the
n
th argument. So<1>
means the 1st argument, and so on. - Ellipses (
...
) mean that you can put any number of things there. Sometimes, these are left out where the text explicitly says that multiple things can be put there.