-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a method to exclude classes by name? #247
Comments
There are a few options. I think the easiest one would be to filter the The other option would be to extract a list of all the addresses that are part of that class (probably by parsing the facts file), and then pass them as arguments to Just to verify, you aren't seeing the same line repeated thousands of times are you? |
Difficult to say, there is a lot of output, but it does seem to be moving forward:
facts + guesses count do increase over time. I'll see what I can learn from the facts file and what I may be able to filter in there.
May be worth noting that xerces and one other library seem to make up every single Is there any documentation I can read on the |
It's not abnormal.
https://github.com/cmu-sei/pharos/blob/master/share/doc/pharos_options.pod Apparently it's |
Ah those CLI args were what I was hunting for, thanks! So I've tried removing all the symbolClass facts, and now I'm getting the following error. Not sure if this is because I've removed all those facts, or if I would have got this error anyway once an analysis with those facts in place got further along.
|
It's hard to say. If you want to post your facts file and/or executable, I can take a closer look. |
Would be good to try and track down why the crash occurs. The files are chonkers (which probably doesn't help); binary is 11mb, serialized file is 530mb and the facts file is 25mb. Is there an email address I can send to? Appreciate your time. |
Wow, that is quite large. If you can send the exe and facts file to [email protected], that would be great. |
Files sent, thanks |
I have downloaded the files, thanks. Can you also provide the command line you used to produce the facts file? |
I've lost the original args due to restarting the container, but I was following the guide for analysing larger files. I think this is what I came up with:
can't remember exactly the value for --per-function-timeout, I think I just spammed zeros. The serialised file was made with something along these lines:
Then as we discussed I went through and manually removed every |
That should be close enough |
Using the facts file that you provided, I end up with:
.idata:00B82B8C ; protected: __thiscall xercesc_2_6::InputSource::InputSource(class xercesc_2_6::MemoryManager * const) So 0xb82b8c is obviously a constructor. From reasonNOTConstructor_G: % Since we don't have visibility into VFTable writes from imported constructors and
% destructors this rule does not apply to imported methods.
not(symbolClass(Method, _, _, _)), Oops. So apparently my suggestion to comment out the symbolClass was not so good. |
Another thing you could try is adding |
I'm going to try the Sounds like the crash I discovered is not a real case and just came from modifying the facts file, and you've given me the requested advice for how to ignore/exclude things, so happy for you to close this issue :) Thanks for your help! Edit: Hmm. So I recreated the facts file with many
|
So this is probably an unintentional "feature" of how --exclude-func is implemented. It excludes functions from the analysis pass that inspects the instructions in the function to determine the properties of the function. It sadly does NOT exclude the function from other parts of the overall system. For example, we should perhaps be able to exclude import descriptors, symbols, and other aspects of the function using the same command line option, but we don't do that currently. The reason it works the way it does is mostly just historical. The option was added to work around functions that took unreasonably long to analyze, or had other problems in the analysis phase that looks at the instructions. I'd have to look more carefully at the code, but probably only about 2/3rds of the facts actually come from that analysis phase. |
OK so my best current option is a gigantic VM and patience? All good, thanks for your time, feel free to close the issue! |
Patience is probably good. I am also trying to run the program on our machine. It's chugging along pretty well. Unfortunately, for a program this large, as long as it is visibly making progress (printing things every few seconds), I would say things are going well. |
I will leave the issue open for now. If you can't successfully run the file, I consider that an issue (and I might need to do some profiling...) |
Sorry I haven't been following the entire conversation well enough to say for certain. sei-eschwartz may still have other suggestions. I saw that he said your sample was quite large. There is a known problem with factNOTMergeClasses, in the sense that it takes N-squared facts to represent the non-merge facts for N unique classes. We've tried addressing this in a variety of ways, but there's no free lunch (e.g. computing each fact as you need it is slow instead of large). Apparently, you have a very large number of classes, and so a large amount of RAM, many CPUs and lots of patience may be the only choice at this point. I just saw that eschwartz responded more quickly than I could. |
No problem, I'll leave it running. Only thing I did note was that the logging output (and therefore possibly the analysis?) seemed to be slowing down over time. At this stage the output was scrolling by quite fast, but by the time I got to about 11hrs of CPU time, 196GB used, I could read the log lines are they went by, and that's when I gave up. I'll try again on a different VM with a higher single-core clock speed. |
So, I actually just had to kill the prolog run on our machine because it ran out of memory (~220 GiB!). I consider this an issue that we should look into. Unfortunately, I'm going to be traveling a lot and I'm not sure when I'll be able to look at it in more depth. |
Yeah, this slowing down is an expected problem, unfortunately. As we make more conclusions, everything gets slower. But 256 GiB is probably not enough ram, since that is what our machine has. You could try a machine with more memory if you can find one, but it's clear there is a problem that is causing Prolog to use way too much memory. |
That's pretty expensive. If it were me I'd wait until I can look more at the problem. There's no guarantee that 384 would even be enough... Checkpointing would be something to suggest to the SWI Prolog maintainer... |
Hey @sei-eschwartz did you ever do some more research on this or @cubecull were you able to get it to work? I think that I am also running out of memory and it's crashing, or the log file could have just been too large (1.5GB when it crashed). Analyzing a .dll that is 7.6MB with a PDB that is 37MB. |
This fell off my radar, unfortunately. @h5kk If you have a PDB, what do you need OOAnalyzer for? |
I guess I misunderstood what OOAnalyzer was for and thought it could provide value beyond the PDB. I am still a bit new to RE. Thank you for the response. |
If you have a PDB, you are in great shape. Import it into Ghidra / IDA, and don't worry about OOAnalyzer. OOAnalyzer is useful for the more common case when you have an executable without a PDB. |
Thank you very much. |
@sei-eschwartz have you had any further thoughts on how to get this working? or a way to estimate how much total RAM is needed based on the input filesize? Happy to shell out for a crazy sized machine if the process is likely to complete, I just don't want to spend $240 for a day's compute time on 1.5TB RAM box for it to fail anyway. |
@sei-ccohen noted a long time ago that the memory usage is probably scaling as n^2 where n is the number of functions. But I can't tell you how exactly which n leads to how much memory usage. I would try to experiment with filtering information out of the .facts file. Maybe using the symbolClass facts as a starting point, filter those out, but then also filter out any facts that also reference the same address. |
My current target uses several third party libraries which I'm not interested in, but the tooling seems to be spending most of it's time crunching that data.
I'm seeing thousands of log lines like:
Is there any way to exclude
xerces*
et al. from being analysed at all stages?The text was updated successfully, but these errors were encountered: