-
Notifications
You must be signed in to change notification settings - Fork 1.7k
C++: Total number of baseline files limit #17743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
👋 @artem-smotrakov, sorry for the late reply! That limit is meant to avoid generating too large a SARIF file when populating the Tool Status Page for information about how many files were analyzed, hitting the SARIF file size limit. It is currently hard-coded and cannot be configured. That said, I'm not entirely sure this limit (and the warning) should really cause a custom query like yours to return no results. Could you:
|
Hi @redsun82 ! Thanks for your reply!
Yeah, sure, it's quite simple
I get the same message for CSV if I use
Would it be possible to make it configurable in one of the next releases? 🤔 |
Hi @artem-smotrakov, The base line information should not influence the result of the query. |
Hi @rvermeulen ! Attaching the results of the |
Hi @artem-smotrakov, Let me forward this to our C/C++ team. |
This issue is stale because it has been open 14 days with no activity. Comment or remove the |
I am working on it, please don't close it. |
I am figuring out if I can actually share it. |
Hi @rvermeulen The Is there anything specific that the C++ team might be looking for in this log? What would be best way to share this log with the team? I would not like attaching it here a publicly visible comment. If that helps, my employer Riot Games is GitHub's customer, I guess there might be some private ticketing/etc services where I could probably post the file? |
Hi @artem-smotrakov, I'm so sorry this has slipped through our cracks, and I just figured out only now you were waiting for a reply. Did you maybe manage to solve this issue on your own in the meantime?
You (or someone in your company) can create a private support ticket here. More infos are available here. If you do, you can link this public issue in the request to help us sort this out more easily. More sensitive log artifacts can then be shared through that channel. |
This issue is stale because it has been open 14 days with no activity. Comment or remove the |
Hi @redsun82
I still see the warning, but looks like the queries run okay with CodeQL This issue aside, I wrote a relatively simple diagnostic query that look for functions that meet certain criteria. The query works well with
Thanks! If I see this issue again, I'll try to send a support request. |
We were forced to change something in the stats file that is used to compute the join-orders. This change happened between 2.20.1 and 2.20.4. We fixed this issues in our own queries that were caused by this, but those fixes might not have been sufficient in general. Would you be able to share your query so we can investigate? |
Hi @jketema
What was the fix? Do you have an example?
Here it is
It looks for PRCs in code that uses Unreal Engine. To test the query, you'd need a multiplayer game based on Unreal Engine, for example, Lyra. Here is RPC example. |
It very much depends:
This is not something I have access to, unfortunately. I can guide you on how to get the numbers you see on some of the PRs from above if that would make sense. |
Thanks for the examples. Is this something you're planning to address in the future CodeQL releases, or do all impacted queries have to be updated? I have to admit that my knowledge of CodeQL doesn't let me understand what's happening in the PR you linked. I am wondering if I can just stay on
Yes, you'd need to join Epic's org to get access to it, instructions https://github.com/tomlooman/ActionRoguelike -- that's another project that can be used for testing the query. |
All queries that were impacted have been updated at this point.
Understandable. Especially since this is not really documented.
Currently I'm not able to address this, as I do not have enough data.
This is very high-overhead, which means I would need to schedule this, but it can only have very low priority because this is the only report we've received about this so far, and as this doesn't show up in our own testing.
This still seems to require access to Epic's org to get access to the required tooling?
To get the numbers you see on the PRs that I linked above:
The |
No. You only need to install Unreal Engine and compiler toolkit https://www.unrealengine.com/en-US/download No need access to the Epic's org. I started the query using the instructions you provided. When it's finished, I'll check the log, and post it here if it's okay. |
Attaching summary.log |
Thanks. It's completely blowing up in your
What's not clear to me though is why the same thing doesn't happen with 2.20.1. Would you also be able to produce the same log file but with 2.20.1? |
|
Reported here: github#17743 Without this change on the query provided by the user: ``` [2025-02-25 12:42:01] Evaluated non-recursive predicate quickquery::UnrealFunctionAnnotation.annotates/1#dispred#9cd6c269@c668c8tv in 23846ms (size: 20381473). Evaluated relational algebra for predicate quickquery::UnrealFunctionAnnotation.annotates/1#dispred#9cd6c269@c668c8tv with tuple counts: 1 ~0% {0} r1 = CONSTANT()[] 27323 ~0% {2} | JOIN WITH `Location::Location.getEndLine/0#dispred#83af84ae#bf` CARTESIAN PRODUCT OUTPUT Rhs.0, Rhs.1 6162566035 ~0% {4} | JOIN WITH `Location::Location.getStartLine/0#d54f9e6c` CARTESIAN PRODUCT OUTPUT Lhs.0, Lhs.1, Rhs.0, Rhs.1 {4} | REWRITE WITH TEST InOut.1 < InOut.3 3894825644 ~5% {2} | SCAN OUTPUT In.2, In.0 73148692 ~0% {3} | JOIN WITH fun_decls_40#join_rhs ON FIRST 1 OUTPUT Lhs.1, Lhs.0, Rhs.1 73148692 ~0% {4} | JOIN WITH `Location::Location.getFile/0#dispred#d1f8b5d1` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.0, Lhs.2 864579 ~0% {2} | JOIN WITH `Location::Location.getFile/0#dispred#d1f8b5d1` ON FIRST 2 OUTPUT Lhs.2, Lhs.3 13010742 ~1% {2} | JOIN WITH macroinvocations_20#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 20653781 ~0% {3} | JOIN WITH `Macro::MacroAccess.getOutermostMacroAccess/0#d58b05db_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, _, Lhs.1 20653781 ~4% {3} | REWRITE WITH Out.1 := 1 20381473 ~8% {2} | JOIN WITH macroinvocations_03#join_rhs ON FIRST 2 OUTPUT Lhs.0, Lhs.2 return r1 ``` With this change: ``` [2025-02-25 12:43:10] Evaluated non-recursive predicate quickquery::UnrealFunctionAnnotation.annotates/1#dispred#9cd6c269@11bf8956 in 928ms (size: 20381473). Evaluated relational algebra for predicate quickquery::UnrealFunctionAnnotation.annotates/1#dispred#9cd6c269@11bf8956 with tuple counts: 6873 ~3% {2} r1 = SCAN fun_decls OUTPUT In.4, In.0 6857 ~0% {3} | JOIN WITH `Location::Location.getStartLine/0#d54f9e6c` ON FIRST 1 OUTPUT Lhs.0, Lhs.1, Rhs.1 6857 ~2% {3} | JOIN WITH `Location::Location.getFile/0#dispred#d1f8b5d1` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2 6193961 ~0% {3} | JOIN WITH `Location::Location.getFile/0#dispred#d1f8b5d1_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2 27389714 ~1% {4} | JOIN WITH macroinvocations_20#join_rhs ON FIRST 1 OUTPUT Lhs.0, Lhs.1, Lhs.2, Rhs.1 27389714 ~1% {4} | JOIN WITH locations_default ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.3, Rhs.4 {4} | REWRITE WITH TEST InOut.3 < InOut.1 13010742 ~1% {2} | SCAN OUTPUT In.2, In.0 20653781 ~0% {3} | JOIN WITH `Macro::MacroAccess.getOutermostMacroAccess/0#d58b05db_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, _, Lhs.1 20653781 ~4% {3} | REWRITE WITH Out.1 := 1 20381473 ~8% {2} | JOIN WITH macroinvocations_03#join_rhs ON FIRST 2 OUTPUT Lhs.0, Lhs.2 return r1 ```
I believe the performance problem should be fixed by #18859 |
I ran the query on my database, and the Do you know if there's any chance to reduce it further? The predicate takes about 3m to complete with 2.20.1. I am not sure it would be possible to bring it back to 3m but anything closer would be great! |
I don't immediately see an easy way of doing this by just modifying the library, because it's difficult to predict what effect any changes to the library have in other contexts. We believe that #18859 is still relatively safe in that respect. Let me have a think though about whether something can be done on the side of your query. |
Thanks! I've updated the predicate to this
because I am only interested in |
This issue is stale because it has been open 14 days with no activity. Comment or remove the |
With the limited time I had for this, I've unfortunately not been able to come up with anything that make things fast again for you. My apologies. |
No problem @jketema ! Thanks! |
This issue is stale because it has been open 14 days with no activity. Comment or remove the |
Hey friends, I have quite a large C++ database:
Before running scans, I normally run some simple diagnostic queries to make sure the database looks fine. The queries look for things like:
FunctionCall
sIfStmt
sWhen I run these queries on this large database, I get this
The exit code is 0 but
calls.sarif
is empty.When I run queries from the standard C++ pack, I get the same message.
What does this limit mean? Is there any way to increase it? I didn't find anything either in the docs or in this repo unfortunately, may be missing something though. Thanks!
The text was updated successfully, but these errors were encountered: