-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Futures timed out after [24 hours] #6
Comments
does it work with a smaller subset?
--
- Fabio Calefato
Sent from iPhone
|
Yes, this software is working fine for small size file. But, If the running time exceeds 24 hours, "Futures timed out after [24 hours]" will be displayed. |
Sorry for the late reply.
We are aware of the issue with large files. This is, however, a limitation of
R itself. So, we need to re-code our script to circumvent the fact that R
by default tries to load an entire file into the memory.
Still, we do not have time immediately to fix this issue -- we're busy
teaching and all right now -- nor we have a student working on it at this
very moment.
If you are in a hurry, I suggest you read this [1] and [2], which give you
an idea of how to resolve the problem. The easiest is to use ff library if
your dataframe contains heterogeneous data; if data are homogeneous (e.g.,
a number matrix), then also bigmemory library will do. The most general
solutions instead are using Hadoop and map-reduce to parallelize your
complex task in smaller, faster subtasks [2], or alternatively, leverage a
database for storing and then querying data [3].
Should you decide to update the script yourself, a pull request would be
very much appreciated! ;-)
HTH,
- Fabio
[1] https://rpubs.com/msundar/large_data_analysis
[2]
http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/
[3]
https://www.datasciencecentral.com/profiles/blogs/postgresql-monetdb-and-too-big-for-memory-data-in-r-part-ii
… |
See issue #7 |
IlyasAzeem
added a commit
to IlyasAzeem/Senti4SD
that referenced
this issue
Dec 11, 2018
Hello! I have updated the script to work with multiple files. Now one need to specify a directory with contains all the files and an output directory (optional). Hope you would like it.
IlyasAzeem
added a commit
to IlyasAzeem/Senti4SD
that referenced
this issue
Dec 11, 2018
Large file issue mentioned in Issue collab-uniba#6 resolve
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have a large file with almost 200k lines. When I run the Senti4SD it takes more than 24 hours and then it displays the error message "Futures timed out after [24 hours]".
Could you please help me how to solve this problem.
The text was updated successfully, but these errors were encountered: