-
Notifications
You must be signed in to change notification settings - Fork 63
Performance improvement - indexes on property values #1440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Performance improvement - indexes on property values #1440
Conversation
| . It only supports exact seeks on range indexes (no full text or spatial). | ||
| . The index order cannot be leveraged, so the planner must insert separate ordering if required later on in the query. | ||
| . Parallel runtime seeks and scans are single-threaded. | ||
| . The planner doesn't combine multiple property index seeks when generating the results for the dynamic part of the query. For example, using `$any` in combination with multiple labels that share an index on a property result in the planner choosing one of the indexes based on selectivity and then stepping through the seek results and filtering for the remainder of the expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- These come out as numbered. Should they be bullet pointed instead?
- The last point is quite hard to follow. Maybe some example cypher could help? e.g.
CREATE RANGE INDEX actor_has_birthyear FOR (a:Actor) ON (a.birthYear)
CREATE RANGE INDEX director_has_birthyear FOR (d:Director) ON (d.birthYear)
// The below MATCH can leverage one of the indexes, but not both
MATCH (p:$any(["Actor", "Director"]) { birthYear: 1983 }) RETURN p.name
Also, on the last point: it is technically the operator that does this index selection and filtering, not the planner. (I'm assuming we try to make the distinction in the docs between planning and runtime/operator phases. You tell me though!)
| CREATE RANGE INDEX actor_has_birthyear FOR (a:Actor) ON (a.birthYear) | ||
| CREATE RANGE INDEX director_has_birthyear FOR (d:Director) ON (d.birthYear) | ||
| MATCH (p:$all(["Actor", "Director"]) {birthYear: 1983}) RETURN p.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this example is a bit confusing in combination with the caveat overleaf which states we can't actually use both the indexes for the MATCH. Perhaps this would be better:
CREATE RANGE INDEX actor_has_birthyear FOR (a:Actor) ON (a.birthYear);
// The below MATCH can leverage the created index and then filter for the Director label
MATCH (p:$all(["Actor", "Director"]) { birthYear: 1983 }) RETURN p.name;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps - worth checking with Matthew Wood which he prefers - a LOAD CSV example:
// people.csv
label,name,birthYear
Actor,Henry Cavill,1983
CREATE RANGE INDEX actor_has_name_and_birthyear FOR (a:Actor) ON (a.name, a.birthYear);
LOAD CSV WITH HEADERS FROM 'people.csv' AS row
// The below MERGE can leverage the index to check for existence before a CREATE
MERGE (:$(row.label) { name: row.name, birthYear: row.birthYear) })
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I like the LOAD CSV example better, if that helps!
jamthief
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Minor last comments, so approving proactively. 🍪
|
|
||
| [source, cypher, role=test-skip] | ||
| ---- | ||
| CREATE RANGE INDEX actor_has_name_and_birthyear FOR (a:Actor) ON (a.name, a.birthYear); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why I made this more complicated than it needs to be now 😕
The point can still be illustrated with a single property index. Feel free to slim it down, or leave as is.
// people.csv
label,name
Actor,Henry Cavill
CREATE RANGE INDEX actor_has_name FOR (a:Actor) ON (a.name);
LOAD CSV WITH HEADERS FROM 'people.csv' AS row
// The MERGE below can leverage the index to check for existence before a CREATE
MERGE (:$(row.label) { name: row.name })
modules/ROOT/pages/deprecations-additions-removals-compatibility.adoc
Outdated
Show resolved
Hide resolved
…ty.adoc Co-authored-by: Rob Steward <[email protected]>
modules/ROOT/pages/deprecations-additions-removals-compatibility.adoc
Outdated
Show resolved
Hide resolved
modules/ROOT/pages/deprecations-additions-removals-compatibility.adoc
Outdated
Show resolved
Hide resolved
|
This PR includes documentation updates Updated pages: |
No description provided.