Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdlib: Add query optimisation to ets:fun2ms/1 #7042

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

TD5
Copy link
Contributor

@TD5 TD5 commented Mar 22, 2023

Unlike writing match specs directly, ets:fun2ms/1 generates queries by translating an erlang function expression. This is convenient and makes for readable queries, but it necessarily trades-off some expressiveness in favour of simplicity (for example, it's not possible to generate a match spec pattern guard that matches against an in-scope variable: users are forced to use something like an equality guard instead). Here, we resolve that issue by reading the author's intention from the given function expression, generating the match spec as before via ms_transform, but then running an optimisation pass over it during compilation in order to generate more efficient queries.

Performance

Amongst other things, we optimise equality guards by moving them into the pattern, which can avoid scanning the whole table, making queries O(1) or O(log(n)) (depending on the table type), rather than O(n), (where n is the number of rows in the table). In other words, this is not primarily a micro-optimisation, but rather a very substantial algorithmic complexity improvement for many common queries.

In practice, I have seen no situations where the new ets:fun2ms/1 queries are slower, but many simple queries can be executed drastically faster when the number of rows in the table is large.

For example, even a simple query over a table of a million rows made up of pairs of keys and values queried with:

make_query(Key) ->
  ets:fun2ms(fun({K, V}) when K =:= Key -> {K,V} end). 

now executes >1000x faster with my local benchmarks. Almost any query which requires that a =:= guard always hold will potentially see a substantial performance improvement for queries over big datasets.

Theory

From the existing ETS match spec docs:

Traversals using match and select functions may not need to scan the
entire table depending on how the key is specified. A match pattern with
a fully bound key (without any match variables) will optimize the
operation to a single key lookup without any table traversal at all. For
ordered_set a partially bound key will limit the traversal to only scan
a subset of the table based on term order. A partially bound key is
either a list or a tuple with a prefix that is fully bound.

We can leverage this knowledge to re-write queries to make better use of the key.

For example:

make_query(Key) ->
  ets:fun2ms(fun({K, V}) when K =:= Key -> {K,V} end). 

was previously compiled to:

{
  {'$1', '$2'},
  [
    {'=:=', '$1', Key}
  ],
  [{'$1', '$2'}]
}

This was sub-optimal, since the equality guard is less efficient than the functionally-equivalent pattern match because the equality guard did not result in a fast lookup using the table's key.

Now, the same function expression is compiled to this, more efficient, query:

{
  {Key, '$2'},
  [],
  [{Key, '$2'}]
}

We can also simplify constant parts of queries statically, and perform other rewritings to improve efficiency, but the largest win comes from inlining the values of variables bound by guards such as (K =:= Key).

Implementation

This optimisation is implemented for all relevant literals that I could find. Floats were given extra consideration and testing because of the differences in ==/=:= vs. pattern matching. In this situation, the handling of floats in ordered_set is safe because we only inline =:= guards into the the match head and body, but we leave == as a guard, since determining statically whether the table type would make this a safe operation or not is not feasible using the the information available in the parse transform.

New unit tests cover the parse transform compiling to the expected match expression, the match expression matching the expected rows, and the equivalence between the naive match expression and the optimised one in terms of data returned. See the changes to ms_transform_SUITE.erl for more information.

This optimisation is specifically applied in ets:fun2ms/1, because I think users would expect generated match specs to avoid trivial inefficiencies (and, indeed, utilising the key efficiently when it was given as a parameter was impossible to express before). Moreover, by making use of ets:fun2ms/1, users have already ceded some control of the generated match spec to the tooling. Users who construct match specs directly will be unaffected.

Notably, since ets:fun2ms/1 is transformed at compile time (outside of the shell, at least), we don't pay any unnecessary performance penalty at runtime in order to apply these optimisations, and the cost of doing them at compile time is low relative to other operations.

Later work could explore runtime query-planning for ETS, but avoiding introducing performance regressions for at least some queries will be harder to guarantee, since we then we would have to consider the runtime cost of computing the optimisation itself.

Optimisation can be disabled with the no_optimise_fun2ms compiler flag, but by default it is enabled. The flag can be altered via the usual compile flag mechanisms, including the -compile(no_optimise_fun2ms) attribute.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 22, 2023

CT Test Results

       3 files     374 suites   46m 8s ⏱️
2 588 tests 2 300 ✔️ 287 💤 1
6 975 runs  6 668 ✔️ 306 💤 1

For more details on these failures, see this check.

Results for commit 4719d0a.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@KennethL
Copy link
Contributor

The optimization is good since it creates an efficient query from an inefficient fun2ms usage.
But the fun2ms can also be written:

3> ets:fun2ms(fun({42, V}) -> {42,V} end).              
[{{42,'$1'},[],[{{42,'$1'}}]}]

which result in the same match spec as the optimized one.
But the optimization let the user write nicer and easier to understand match specs with fun2ms

@TD5
Copy link
Contributor Author

TD5 commented Mar 23, 2023

This is true. Perhaps my example was too simplified to show the value of the optimiser in general.

Consider a case where we don't have a literal 42, but instead a parameter, Key, to our function make_query/1. In that case, we can't write something that might come naturally, like:

make_query(Key) ->
  ets:fun2ms(fun({Key, V}) -> {Key,V} end). 

because of the existing scoping/shadowing rules of Erlang.

Instead, people tend to write the functionally equivalent, but significantly less efficient:

make_query(Key) ->
  ets:fun2ms(fun({K, V}) when K =:= Key -> {K,V} end). 

This PR solves the issue by letting users write the second form which makes use of a guard, but then optimises it away behind the scenes. In some sense, this is a work around for the inability to write the pattern match in the fun due to Erlang's scoping/shadowing rules, but in another sense, you can just see it as a general purpose query optimiser with reduces the need for the user to understand precisely how a query will be executed.

EDIT: I've updated the example in the pull request description to show this.

@sverker sverker self-assigned this Mar 23, 2023
@sverker sverker added team:VM Assigned to OTP team VM enhancement labels Mar 23, 2023
Copy link
Contributor

@sverker sverker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. This is a nice optimization.

Some code style comments:
I'm not fond of white space fixes in otherwise untouched lines. I would prefer them in a separate commit if you want to keep them.
Some lines seemed a bit too long for my taste. We try to keep the 80 column limit in general I think.

@TD5
Copy link
Contributor Author

TD5 commented Mar 24, 2023

Some code style comments: I'm not fond of white space fixes in otherwise untouched lines. I would prefer them in a separate commit if you want to keep them. Some lines seemed a bit too long for my taste. We try to keep the 80 column limit in general I think.

Thanks for the feedback, I'll get that fixed up.

EDIT: It should look a bit better now.

@TD5 TD5 force-pushed the ets-opt branch 10 times, most recently from 7c48522 to 53fd412 Compare March 27, 2023 09:35
@DianaOlympos
Copy link
Contributor

Question that probably is out of scope. Would this optimisation also impact tracing, or is the functionality too different to matter in the way match specs are run in this case?

I know this particular change does not impact it, i am thinking on applicability in other situations.

@TD5
Copy link
Contributor Author

TD5 commented Apr 3, 2023

Would this optimisation also impact tracing, or is the functionality too different to matter in the way match specs are run in this case?

I don't know how tracing is implemented at runtime, so I can't say for sure, but I suspect there won't be much of a win there because the re-writes here attempt to leverage ETS table indexing, and it's not clear to me how that would benefit tracing.

That said, the re-write makes the structure of the query more explicit, so even if there's no immediate win, in principle, I suspect it could be leveraged in the future to offer a small performance benefit.

@sverker
Copy link
Contributor

sverker commented Apr 5, 2023

There are a bunch of failing test cases in qlc_SUITE

They seem to be triggered by this PR, but I haven't investigated further.

@TD5
Copy link
Contributor Author

TD5 commented Apr 17, 2023

There are a bunch of failing test cases in qlc_SUITE

They seem to be triggered by this PR, but I haven't investigated further.

@sverker: Good find! I was running only a subset of the unit tests locally, and didn't think to include qlc 🤦‍♂️ The existing logic there called ms_transform:parse_transform/2 in a few places which assumed it only output one match function per match spec, which is no longer true after optimisations. Rather than let this change spider out into lots of other code, I just disable the optimisations when calling ms_transform from qlc_pt.

@TD5 TD5 marked this pull request as draft June 5, 2023 14:19
@TD5 TD5 marked this pull request as ready for review June 6, 2023 14:34
@TD5 TD5 changed the base branch from master to maint June 6, 2023 15:11
@TD5 TD5 changed the base branch from maint to master June 6, 2023 15:12
Copy link
Contributor

@sverker sverker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the merging of master to fix conflicts?

@@ -7257,15 +7257,18 @@ manpage(Config) when is_list(Config) ->
[2,3,4] = qlc:eval(QH),

%% ets(3)
MS = ets:fun2ms(fun({X,Y}) when (X > 1) or (X < 5) -> {Y} end),
Copy link
Contributor

@sverker sverker Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the changing from (X > 1) or (X < 5) to (X > 1) and (X < 5) just to make the test more sensible or did the old condition (which is always true) fail the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was both: The old condition seemed odd, but it also broke the test, since the test is asserting the handle is the same via ets:fun2ms and qlc, but this isn't true if ets:fun2ms applies some optimisations (which it did in the original case)

@TD5 TD5 force-pushed the ets-opt branch 2 times, most recently from a8f91de to 9f0b4b4 Compare September 4, 2023 13:21
@TD5 TD5 changed the base branch from master to maint September 4, 2023 13:21
@TD5
Copy link
Contributor Author

TD5 commented Sep 4, 2023

Was the merging of master to fix conflicts?

It seems like CI fails to build due to some sort of caching issue when I base this change on maint (it still builds fine for me locally). I've set github to target erlang:maint, but the commit itself is against my fork of erlang:master. Hopefully CI will now go green against master, but it can ultimately be merged to maint.

EDIT:
Okay, I just can't make the cache in the erlang/otp CI happy, but It Works on my Machine™ (and also in my own fork of erlang/otp's GitHub actions). I wonder whether any maintainers would clear the build cache, or otherwise help resolve the issue?

Unlike writing match specs directly, `ets:fun2ms/1` generates queries by
translating an erlang function expression. This is convenient and makes
for readable queries, but it necessarily trades-off some expressiveness
in favour of simplicity (for example, it's not possible to generate a
match spec pattern guard that matches against an in-scope variable:
users are forced to use something like an equality guard instead).
Here, we resolve that issue by reading the author's _intention_ from
the given function expression, generating the match spec as before via
`ms_transform`, but then running an optimisation pass over it during
compilation in order to generate more efficient queries.

Performance
===========

Amongst other things, we optimise equality guards by moving them into
the pattern, which can avoid scanning the whole table, making queries
`O(1)` or `O(log(n))` (depending on the table type), rather than `O(n)`,
(where `n` is the number of rows in the table). In other words, this is
not primarily a micro-optimisation, but rather a very substantial
algorithmic complexity improvement for many common queries.

In practice, I have seen no situations where the new `ets:fun2ms/1`
queries are slower, but many simple queries can be executed drastically
faster when the number of rows in the table is large.

For example, even a simple query over a table of a million rows made up
of pairs of keys and values queried with:

```erlang
make_query(Key) ->
  ets:fun2ms(fun({K, V}) when K =:= Key -> {K,V} end).
```

now executes **>1000x faster** with my local benchmarks. Almost any
query which requires that a `=:=` guard always hold will potentially see
a substantial performance improvement.

Theory
======

From the existing ETS match spec docs:
> Traversals using match and select functions may not need to scan the
> entire table depending on how the key is specified. A match pattern with
> a fully bound key (without any match variables) will optimize the
> operation to a single key lookup without any table traversal at all. For
> ordered_set a partially bound key will limit the traversal to only scan
> a subset of the table based on term order. A partially bound key is
> either a list or a tuple with a prefix that is fully bound.

We can leverage this knowledge to re-write queries to make better use of the key.

For example:

```erlang
make_query(Key) ->
  ets:fun2ms(fun({K, V}) when K =:= Key -> {K,V} end).
```

was previously compiled to:

```erlang
{
  {'$1', '$2'},
  [
    {'=:=', '$1', Key}
  ],
  [{'$1', '$2'}]
}
```

This was sub-optimal, since the equality guard is less efficient than
the functionally-equivalent pattern match because the equality guard did
not result in a fast lookup using the table's key.

Now, the same function expression is compiled to this, more efficient, query:

```erlang
{
  {Key, '$2'},
  [],
  [{Key, '$2'}]
}
```

We can also simplify constant parts of queries statically, and perform
other rewritings to improve efficiency, but the largest win comes from
inlining the values of variables bound by guards such as `(K =:= Key)`.

Implementation
==============

This optimisation is implemented for all relevant literals that I could
find. Floats were given extra consideration and testing because of the
differences in `==`/`=:=` vs. pattern matching.  In this situation, the
handling of floats in `ordered_set` is safe because we only inline `=:=`
guards into the the match head and body, but we leave `==` as a guard,
since determining statically whether the table type would make this a
safe operation or not is not feasible using the the information
available in the parse transform.

New unit tests cover the parse transform compiling to the expected match
expression, the match expression matching the expected rows, and the
equivalence between the naive match expression and the optimised one in
terms of data returned.  See the changes to `ms_transform_SUITE.erl` for
more information.

This optimisation is specifically applied in `ets:fun2ms/1`, because I
think users would expect generated match specs to avoid trivial
inefficiencies (and, indeed, utilising the key efficiently when it was
given as a parameter was impossible to express before). Moreover, by
making use of `ets:fun2ms/1`, users have already ceded some control of
the generated match spec to the tooling. Users who construct match specs
directly will be unaffected.

Notably, since `ets:fun2ms/1` is transformed at compile time (outside of
the shell, at least), we don't pay any unnecessary performance penalty
at runtime in order to apply these optimisations, and the cost of doing
them at compile time is low relative to other operations.

Later work could explore runtime query-planning for ETS, but avoiding
introducing performance regressions for at least some queries will be
harder to guarantee, since we then we would have to consider the runtime
cost of computing the optimisation itself.

Optimisation can be disabled with the `no_optimise_fun2ms` compiler flag,
but by default it is enabled. The flag can be altered via the usual
compile flag mechanisms, including the `-compile(no_optimise_fun2ms)`
attribute.
@TD5 TD5 changed the base branch from maint to master September 4, 2023 13:43
@sverker
Copy link
Contributor

sverker commented Sep 21, 2023

I don't think I will get time to really scrutinize the implementation, but the amount of test cases looks reassuring.

It would be nice if you could write something in ets:fun2ms docs about this. Maybe use the new <change> tag when talking about difference in behavior between OTP versions (like in binary_to_term/2).

@TD5
Copy link
Contributor Author

TD5 commented Sep 22, 2023

It would be nice if you could write something in ets:fun2ms docs about this. Maybe use the new <change> tag when talking about difference in behavior between OTP versions

I think that's a really good suggestion. I'll add something.

@@ -101,7 +101,7 @@ forms(Forms) -> forms(Forms, ?DEFAULT_OPTIONS).

forms(Forms, Opts) when is_list(Opts) ->
do_compile({forms,Forms}, [binary|Opts++env_default_opts()]);
forms(Forms, Opt) when is_atom(Opt) ->
forms(Forms, Opt) when is_atom(Opt) orelse is_tuple(Opt) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we need to do this, can't the uses in ms_transform_SUITE be changed to list form instead?

Comment on lines +218 to +223
MaybeOptimise =
case ShouldOptimise of
true -> fun optimise_ms/1;
false -> fun (MS) -> MS end
end,
case catch MaybeOptimise(ms_clause_list(1,Clauses,Dialect,gb_sets:new())) of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is needlessly complicated (and was so before!). How about refactoring it a bit, e.g.

transform_from_shell(Dialect, Clauses, BoundEnvironment, ShouldOptimise) ->
    SaveFilename = setup_filename(),
    try
        CList0 = ms_clause_list(1, Clauses, Dialect, gb_sets:new()),
        CList1 = case ShouldOptimise of
                     true -> optimise_ms(CList0);
                     false -> CList0
                 end,
        CList = normalise(fixup_environment(CList1, BoundEnvironment)),
        cleanup_filename(SaveFilename),
        CList
    catch
        throw:{error,AnnoOrUnknown,R} ->
            {error,
              [{cleanup_filename(SaveFilename), %% ... prevents using `after`, sigh
               [{location(AnnoOrUnknown), ?MODULE, R}]}],
             []};
        error:Reason ->
            cleanup_filename(SaveFilename),
            exit(Reason)
    end.

%io:format("Forms: ~p~n",[Forms]),
case catch forms(Forms) of
ShouldOptimise = not proplists:get_bool(no_optimise_fun2ms, Options),
case catch forms(Forms, ShouldOptimise) of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're feeling up to it, you can refactor this like transform_from_shell/4 too. :)

Comment on lines +380 to 381
form({attribute,_,file,{Filename,_}}=Form, _) ->
put_filename(Filename),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're already going through the trouble of lugging ShouldOptimize everywhere, perhaps we should change to using a state record for that and all the things that (ab)use the process dictionary.

Comment on lines +481 to +482
try
Compound =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Compound is returned straight away, try ... of Compound -> Compound catch _:_ -> UnOpt end

I'd also try to make the conditions for giving up on optimization explicit, as _:_ can hide actual errors in the pass. If it's not too much work I'd like explicit throw:impossible (or similar) instead.

% Gracefully handle contradictive cases such as (X =:= 1) and (X =:= 2)
% by reducing the guard to just 'false'
ConflictingColumns =
maps:filter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ConflictingColumns is only checked for a size of 0 later on, can't this be a maps:fold/3?

% (K1 =:= K2) andalso (K1 =:= K3) doesn't imply a contradiction,
% since despite the names being different, their bound values
% may be the same at runtime
[ S || S <- AllSubstitutionsForColumn, not is_column_ref(S)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we filter not is_column_ref(S) before passing the list to maps:groups_from_list/3?

end,
ColumnsToSubstitutions
),
LookupSubstitution =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the funs from here on seem to be used as funs, and they don't refer to the environment in a meaningful way. Please break them out into separate functions.

Comment on lines +888 to +895
SimplifiedOperand1 = simplify_body_expr(Operand1),
SimplifiedOperand2 = simplify_body_expr(Operand2),
SimplifiedOperand3 = simplify_body_expr(Operand3),
simplify_guard_function(
Operator,
SimplifiedOperand1,
SimplifiedOperand2,
SimplifiedOperand3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I think the calls themselves provide enough context here.

Suggested change
SimplifiedOperand1 = simplify_body_expr(Operand1),
SimplifiedOperand2 = simplify_body_expr(Operand2),
SimplifiedOperand3 = simplify_body_expr(Operand3),
simplify_guard_function(
Operator,
SimplifiedOperand1,
SimplifiedOperand2,
SimplifiedOperand3);
simplify_guard_function(
Operator,
simplify_body_expr(Operand1),
simplify_body_expr(Operand2),
simplify_body_expr(Operand3));

substitute_promotions_in_guards_expr(
Promotions,
{tuple, _, [{atom, _, Operator}, Operand1, Operand2, Operand3]}) ->
RemainingOperand1 = substitute_promotions_in_guards_expr(Promotions, Operand1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(See comment on simplify_body_expr/1)

@sverker sverker added the waiting waiting for changes/input from author label Oct 11, 2023
@sverker
Copy link
Contributor

sverker commented Oct 12, 2023

Reading your post at erlang forums maybe you also realized this cannot be optimized correctly:

1> ets:fun2ms(fun({K,V}) when V =:= '$1' -> K end).
[{{'$1',{const,'$1'}},[],['$1']}]

I also discovered this bug:

2> ets:fun2ms(fun({K,V}) when K =:= V+1 -> K end).
[{{{'+','$2',1},'$2'},[],[{'+','$2',1}]}]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement team:VM Assigned to OTP team VM waiting waiting for changes/input from author
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants