-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many open files #22
Comments
The bug happend again. We should fix it soon. |
I can test the issue by starting FREME on ubuntu and using this command to count the number of open file handles:
where 3919 is the process id of freme. The number of file handles increases with each API call that uses e-Internationalisation. I could find and fix two streams that where not closed properly and reduce the number of newly opened streams by each API call from 3 to 1. Next step: Find the last stream that is not closed properly. |
This hould be fixed. @jnehring could you check this on freme-dev? |
I tested it. The problem is fixed. Great! |
The bug happened again today. |
The bug happened again and I could solve it again by restarting the server. I could not find the reason for the problem. I cannot even reproduce it locally. Next step to find the bug: I want to run the broker on my local VM and process a larger dataset (e.g. 1000 documents) with it using the same pipeline that WRIPL uses. |
I analysed the content going to FREME. When putting the following content in
Through this logging I found out that some special characters lead to crashes in e-Link. Processing a pipeline with the following content leaves 10 open file handles. It was executed on a FREME test installation on freme-live.
Next steps:
|
Maybe you can see if it are connections (sockets) and what the connections are with |
The problem does not occur any more on freme-live. We have an apache webserver on freme-live that proxies the requests to our API. This proxy was not configured properly, the module mod_xml2enc was not installed. Therefore special characters got mangled up. These special characters caused the open streams. I installed mod_xml2end and now the number of open streams is constant. I leave this issue open because there is still a problem in FREME which we should find. But hopefully the broker does not crash all the time now. |
The bug still occurs. |
Under what OS is running your server? |
And can you dump the heap of the java process(es), when there a many open files, like so:
where Then, opening this file(s) in visualvm or jvisualvm (part of the JDK), might allow to see instances of Streams, Files, Sockets, or whatever causes the problem and where they are initialised. So can you do this, compress the dump(s) (xz works good here) and send it/them to me or put them temporarily on Google Drive? |
@Xfran I send you the data via email. |
@jnehring answered via email. Please let me know if any issues or improvements. |
It seems that there are 98 If this causes the problem (not sure!), then my guess is that somewhere in e-Link a Jena model, query or iterator is not closed properly. |
It's mostly about CONSTRUCT queries. |
I pushed some changed to @jnehring You have to trigger a Jenkins build, because I seem to have credentials problems (see the console output) |
Remark: I didn't test!. Hope I didn't close too much or too early :) |
I manually triggered the re-build and then it re-build the project. Tested e-Link, still works. I will do the release and then we can see if it fixes the bug. |
I have installed the new version of e-link that @ghsnd created on freme-live during freme-project/technical-discussion#122 Lets hope that this solves the problem |
The bug is still there but it is not as problematic as before. Since friday FREME was used a lot and now there are 2467 open files. I cannot easily determine the exact amount of requests but last weekend FREME would have produced more open files and would have crashed. |
It could be that there are still other resources not being closed somewhere else in the code. If you can make a heap dump again, I will take a look. Or... all partners double search some part of the code base for resources that are not closed ;) |
I restarted FREME live so it does not crash so all open files are gone. I will do another heap dump but I need to wait for a couple of days so there are some open files. |
open stream
I found one more stream that is not closed properly in the SPARQL converters controller and added the finally statement to it. It is a stream that writes a SPARQL SELECT queries result set to CSV, XML or JSON format. |
Closed some resources here. |
Thanks @ghsnd . I wanted to create the heap dump today but the too many open files bug happened again and when it happened I cannot create the heap dump. Will try again... |
I created a new memory dump, in a state when the system had 3506 open files and was shortly before crashing. However the latest open streams that @ghsnd and me fixed here are still not installed on freme-live. So next week after the Semantics I want to release and store the latest state on freme-live because maybe the bug has already bin fixed. |
i added a script to the crontab of freme live that restarts freme live every midnight as a temporary solution to the problem. |
I installed a new version of the broker and freme ner and switched off the cronjob to restart freme. lets keep our fingers crossed that this solves the issue |
About 20 hours after the release the number of open files was 2300. So this problem is still not fixed. I switched on script to restart FREME every midnight. |
Today I found this, since it's the number of 'pipes' that increases... Soon to be checked. |
A dump of the threads reveals a lot of this kind of threads (and their number grows over time):
This doesn't say much, but AbstractMultiworkerIOReactor is used in CloseableHttpAsyncClientBase, which is used in the Spring MVC framework and in the Spring HTTP client. To be investigated further... |
I read this article about JDBC database connections in high concurrency scenarios and therefore added this configuration to freme live. Maybe the open connections are database connections. So now there is a maximum number of open database connections and further no database connection will be alive for more then 10 seconds.
|
FREME live becomes unstable after running for a while with this error:
It happened two weeks ago and now it happened again. The problem can be solved through restarting freme-live.
I backuped the log files (around 500 mb or error logs) in ~/logs-open-files on freme-live. I guess the problem is an open file handle or stream that is not closed properly. When this happened 20.000 times then freme-live fails. Probably it is caused by the WRIPL requests because no one else does so many requests in two weeks on freme-live.
The text was updated successfully, but these errors were encountered: