-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruby: Query for database calls in a loop #18304
Conversation
aa8751c
to
5d3a541
Compare
489bd2a
to
8e36d3f
Compare
QHelp previews: ruby/ql/src/queries/performance/DatabaseQueryInLoop.qhelpDatabase query in a loopWhen a database query operation, for example a call to a query method in the Rails `ActiveRecord::Relation` class, is executed in a loop, this can lead to a performance issue known as an "n+1 query problem". The database query will be executed in each iteration of the loop. Performance can usually be improved by performing a single database query outside of a loop, which retrieves all the required objects in a single operation. RecommendationIf possible, pull the database query out of the loop and rewrite it to retrieve all the required objects. This replaces multiple database operations with a single one. ExampleThe following (suboptimal) example code queries the repo_names_by_owner.map do |owner_slug, repo_names|
owner_id, owner_type = User.where(login: owner_slug).pluck(:id, :type).first
owner_type = owner_type == "User" ? "USER" : "ORGANIZATION"
rel_conditions = { owner_id: owner_id, name: repo_names }
nwo_rel = nwo_rel.or(RepositorySecurityCenterConfig.where(rel_conditions)) unless neg
nwo_rel = nwo_rel.and(RepositorySecurityCenterConfig.where.not(rel_conditions)) if neg
end To improve the performance, we instead query the # Preload User data
user_data = User.where(login: repo_names_by_owner.keys).pluck(:login, :id, :type).to_h do |login, id, type|
[login, { id: id, type: type == "User" ? "USER" : "ORGANIZATION" }]
end
repo_names_by_owner.each do |owner_slug, repo_names|
owner_info = user_data[owner_slug]
owner_id = owner_info[:id]
owner_type = owner_info[:type]
rel_conditions = { owner_id: owner_id, name: repo_names }
nwo_rel = nwo_rel.or(RepositorySecurityCenterConfig.where(rel_conditions)) unless neg
nwo_rel = nwo_rel.and(RepositorySecurityCenterConfig.where.not(rel_conditions)) if neg
end |
This does not take assicoations into account. It uses ActiveRecordModelFinderCall to identify relevant calls. This class has therefor been made public.
8369852
to
5feb401
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together - the general direction looks reasonable and I like the simple statement of the problem. Suggestions mainly involve documenting the logic, adding tests, and clarifying the help text for both readers and autofix.
I also recommend a PR title and description stating exactly what you are adding, so we can easily find and understand it later.
I agree you don't need a change note yet.
* @tags performance | ||
*/ | ||
|
||
// Possible Improvements; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's file internal issues for both ideas.
Co-authored-by: Aditya Sharad <[email protected]>
I apuse here, because the code may be simplified
use the CFG more than the AST
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, ready to merge when DCA is happy.
|
||
class DatabaseQueryInLoopTest | ||
def test | ||
### These are bad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we please use InlineExpectationsTestQuery.ql
instead, like e.g. https://github.com/github/codeql/blob/main/ruby/ql/test/query-tests/security/cwe-022/PathInjection.qlref?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, yes!
/** Gets the name of a built-in method that involves a loop operation. */ | ||
string getALoopMethodName() { | ||
result in [ | ||
"each", "reverse_each", "map", "map!", "foreach", "flat_map", "in_batches", "one?", "all?", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should in_batches
be excluded since it seems to signal some reflection on looping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvitved are you happy with test changes (Qlucie run seemed to pass, although I was surprised we didn't have to change any of the inline comments)?
Inline comments are added. The existing comments do not start with |
Add query for hoisting Rails ActiveRecord calls.
It uses ActiveRecordModelFinderCall to identify relevant calls and this PR is making that class public. Please let me know if that is not acceptable.
Possible Improvements;
Associations are lazy-loading by default, so something like
in a loop over
article
doarticle.book
if you have 1000 articles it will do a 1000 calls to
book
.If you already did
article includes book
, there should be no problem.calls to
pluck
. (An earlier version added exactly calls topluck
which added exactly 1 result on gh/gh; currently the query has 195 results, so the complication did not seem worth it.)The query comes with a
.qhelp
and autofixes have been validated on a small set (~15) of alerts.This query is intended for CCR, so I am not sure if we want a change note.