Replies: 13 comments 1 reply
-
Although the references are interesting and the concept probably worth pursuing, it's not at all obvious this will give you the kind of speed-up that will negate the bulk of complaints. Do the obvious experiment - remove all synchronization on the database. You get maybe a 1% improvement in performance. Doesn't contradict your contentions, but doesn't solve the problem either. |
Beta Was this translation helpful? Give feedback.
-
If you do the obvious test and don't end up with a corrupted database, then I would think that nothing is actually running in parallel. This is a stupid rough estimate but at least 85% of code executing in ghidra is accessing the database. It is impossible not to have a contended lock and I think this would be difficult to measure. The average time slot is something around 10ms I think. So for things that were already going pretty fast the difference would be negligible for things that were slow it could be much higher. I'm falling asleep so I apologize if what I said doesn't make sense or is incoherent. Regardless I know it can do better. I've written proprietary code that parsed something like 7GB of dwarf (gtk webkit debug elf) in around 30 seconds but yet it would take ghidra forever and eat a ton of memory. This really isn't a fair comparison but my point is that something is wrong. |
Beta Was this translation helpful? Give feedback.
-
The test I'm running is to import a medium size (30M) executable and run the default analysis. This takes about 15 minutes on my M1 laptop. Guessing it's your contention this is too small a test case. (I will attempt something of the order of 7GB tomorrow.) To your point re comparing your proprietary code, not even slightly a fair comparison. (I can run "find" over a 4TB drive in 30 seconds, so I have to surmise your code is horribly inefficient - really?) At a very basic level, two points you have made are arguably irrelevant. Whether or not the database is corrupted is not pertinent to a test of speed. If the databse is not corrupted because things are not running in parallel, then the problem with respect to analysis time is the lack of parallelism not contention over the lock. The fact that 85% of the code accesses the database seems also beside the point - of course it is, that pretty much is the only function of the initial analysis. My point is the team is interested in this problem, but divided on the real cause and the benfits of re-implementing the synchronization methods. If you can make the case for specific solutions that has significant benefits, I don't think anyone here is going to refuse the help. |
Beta Was this translation helpful? Give feedback.
-
Yes, my point was that if it isn't corrupted when synchronization is just removed, then there is a much larger underlying problem. Ghidra uses a ton of threads during analysis and everything is accessing the database. It should be impossible to not get corrupted or have broken analysis. Whether multiple threads are running in series or in parallel is absolutely pertinent to a test of speed. The file size isn't that relevant and your initial sample is likely sufficient. My point in mentioning the dwarf parsing of the 7GB pure dwarf elf was that a massive amount of data can be processed in seconds as opposed to the minutes/hours it would take ghidra. Ghidra should be much faster than it is and claims of such things as running analysis in headless mode or minimizing the GUI are nonsense as the problem persists. The benefits of using readwrite locks inside of synchronization should be obvious. Using a readwrite lock allows access in parallel and synchronization forces access in series. If you're only allowing one thread to run at a time, then only one is going to run at a time. If you let multiple threads run at a time, then the os and jvm will run them in parallel as approved for the available CPU resources. If it can't, it won't. If it can it most likely will. Actually, if you can take the analysis time of each individual analyzer and sum then and get the total analysis time, doesn't that already prove they are not running in parallel? |
Beta Was this translation helpful? Give feedback.
-
@astrelsky Reconsidering your comments and realizing something must be amiss, have uncovered "interesting" behavior on my current machine - code not being replaced even after a recompie, so going to retract my previous contention that disabling the locks has minimal effect. Revisting - will keep you posted. |
Beta Was this translation helpful? Give feedback.
-
OK, have a fairly badly-written branch that replaces ghidra.util.Lock with the ReentrantReadWriteLock. It's not quite sufficient to make an apples-to-apples comparison, but I think it's enough to get a sense of the possible payoff. Not apples-to-apples because several of the more expensive analyses will deadlock the database, so, until I figure out the details of the deadlock, I can't really run them. The cost of individual analyses obviously don't really add linearly so take my numbers with a grain of salt. Original analysis cited above was about 900 seconds. Reimplemented it's about 500, which sounds significant, except it excludes the Decompiler Switch analysis, the Demangler, and the Constant Propagation. If those costs were factored at the same level of improvement, you're probably tacking on another 200 seconds. So, I guess the question for the next team meeting will be is a 25% improvement in speed worth the added complexity? Bear in mind, in this case, the complexity will not be a one-time cost. New analyzers will have to be extremely careful not to do things like trying to acquire the write lock downstream from a read lock or passing locks between threads with other dependencies. And/or is it a better investment of time to look into other possible bottlenecks (e.g. use of highly-active sorted lists, inherently expensive operations on address range sets, or things like serialization/deserialization from the decompiler) or newer database technologies (which have made pretty significant advances in recent years)? |
Beta Was this translation helpful? Give feedback.
-
I would have thought that the acquisition of locks was abstracted away enough into the database classes such that nothing else would need to worry about it. The database should get the read lock, make it's copy, release and return the copy and write should get the write lock, write and release. I'm not sure where anything else would need to be careful but I don't know how you did it. I also think you did too much work on your day off :/. As far as other database implementations go, you probably want to be sure it's really worth it before going that route. However, I thought I'd mention that I have had really good success and phenomenal performance with using apache arrow at least with go and python. It has a library for almost every language including Java but it does use JNI. For performance were talking writes of around 10GB in 7 seconds or so. It put my disk usage at 100% while writing (which is good when writing that much data). The numbers are rough because it was well over a year and a half ago and I didn't record them. I remember that they (apache) were looking to make some data frame library or something specifically for databases using arrow, but it's been a long time since I looked at it. I don't remember how the decompiler gets data and I'm assuming it asks java for it, but if the library allows safe access from multiple processes then that could provide some significant improvements to the Decompiler too assuming that is a bottleneck. Also I think the 25% improvement you saw wouldn't be linear (would have higher growth but not quite exponential). It would help with the GUI too because things would stop blocking the damn swing thread. |
Beta Was this translation helpful? Give feedback.
-
Regarding deadlocks, just some general rules that are my own ideas and may not reflect best practices. Anything using read write locks should be encapsulated and the class should adhere to the following:
If any other database implementations are discussed, please slap anyone who mentions any SQL based databases. If I was on the team and learned that ghidra was going to use SQL crap I would quit. Adding a SQL engine would just add a ton of bloat without much benefit and would be over complicated to use. There are much nicer, vectorized, modern columnar formats available that take full advantage of modern cpus. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the pointers - I think the original implementation is consistent with those, so my mods inherit. No worries re SQL - no love here and hasn't ever been on the table. Bunch of folks around with more DB expertise than me, pinging as we go, although, I think, a wholesale replacement of the DB would require considerable discussion with folks on the team who currently have full plates. |
Beta Was this translation helpful? Give feedback.
-
Ok. A wholesale replacement would be a huge undertaking. Also a 25% performance boost by only changing code that everything touches sounds like a no brainer to me, even if it was done badly as a test. I wouldn't change the GhidraLock, that would create a thread safety nightmare. Just replace locks and synchronization with rwlocks where appropriate. |
Beta Was this translation helpful? Give feedback.
-
I'd like @ghidra1 to weigh in here. He has done work in this area in the past. We have seen mixed results, in terms of performance improvement. It would be nice to get more data on what and where the improvements are. |
Beta Was this translation helpful? Give feedback.
-
I'd like to argue that this is less of a performance improvement and more of an attempt to stop hampering performance. It's pretty hard to argue against letting everyone read the display at the same time instead of single file when it's using a projector. This will be good for Ghidra overall. I think you would always have mixed results because this would be heavily dependent on OS scheduling policies and how many processes are running and what they are doing. The more other processes you have running that are actually doing CPU intensive work, the worse the current implementation would be and you would see a much larger improvement with concurrent reads. I don't think I really need to argue that lock contention should be avoided when 100% not necessary. This is what I get when just turning on my laptop and signing in and only Windows bloat running in the background that I haven't disabled. If you have a native thread that enters lock contention and something else needs the CPU, the OS is going to let something else run and you'll need to wait your turn again. Even if nothing else is waiting and ready to run (unlikely), the OS still has to stop you and check. This is just simple maths for me even though I could very well be wrong. I am not an expert in this subject, I am only an expert idiot. |
Beta Was this translation helpful? Give feedback.
-
OK, so I finally got a handful of apples-to-apples comparisons and the results are basically not promising. In those cases where I imported/analyzed a medium-sized executable with the existing locking schema, re-imported/re-analyzed it with the ReentrantReadWriteLock, and then diffed the results to insure I had actually done the same analyses, I got a 5-10% improvement. In an equal number of cases, I got a 25% improvement, but the results were mismatched and incomplete for the ReentrantReadWriteLock. Leaving aside the fact that my implementation is obviously not perfect and would need work, the very slight improvement in speed should be weighed against the following: (1) Per discussion with @ghidra1, switching schemes is likely to be affecting the caches which were protected in the original framework but now would need additional synchronization. (2) Also per @ghidra1, the fact that the ReentrantReadWriteLock depends on the thread id means we would have to completely re-engineer most of our tasks that delegate to the Swing thread or encapsulate the locking primitives in some way. (3) The new schema would make future development extremely precarious. The first access to the lock by any given thread basically has to know at the point of acquisition whether it needs a read or write primitive. This sounds straighforward, but I've repeatedly hit writes buried 10 or more levels into the code from methods I thought were ostensibly read-only. Worse, even when we can identify the chain correctly, the entire chain essentially becomes write-locked negating a substantial portion of the benefit. The literature does not have good solutions to this problem. (4) There are definitely other areas that merit investigation re possible speed-ups, and many of them look more promising than this one. |
Beta Was this translation helpful? Give feedback.
-
Is your feature request related to a problem? Please describe.
I'm always frustrated when I get blocked and am forced to give up the remainder of my time slot to another thread, or worse, another process.
Describe the solution you'd like
ReentrantReadWriteLock
ghidra/Ghidra/Framework/DB/src/main/java/db/Table.java
Lines 695 to 696 in 2eff37f
ghidra/Ghidra/Framework/DB/src/main/java/db/Table.java
Lines 941 to 942 in 2eff37f
ghidra/Ghidra/Framework/DB/src/main/java/db/Table.java
Lines 2204 to 2205 in 2eff37f
I hope I don't need to explain more.
Describe alternatives you've considered
Continue standing with everyone else complaining "Ghidra slow" and "Analysis takes a million years".
Additional context
Thread safety != concurrency. The remaining additional context only applies if there is file io occurring during database accesses. I would think there is as I would hope everything isn't being kept in memory, but I struggled to track it down.
I couldn't easily find where file io is to know how applicable it would be, but you can read/write to different parts of the same file concurrently.
https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/channels/FileChannel.html#read(java.nio.ByteBuffer,long)
https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/channels/FileChannel.html#write(java.nio.ByteBuffer,long)
https://blogs.oracle.com/javamagazine/post/java-virtual-threads
See sections "Virtual threads and I/O" and "Virtual thread guidelines"
Also, what?
ghidra/Ghidra/Framework/DB/src/main/java/db/Table.java
Lines 2297 to 2306 in 2eff37f
I'd like to argue that an
Iterator
doesn't need to be thread safe. The only reason I can think of to share anIterator
between threads is to intentionally have a bad time.Beta Was this translation helpful? Give feedback.
All reactions