Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Error uploading big files #4069

Open
rs-develop opened this issue Dec 20, 2024 · 3 comments
Open

BUG: Error uploading big files #4069

rs-develop opened this issue Dec 20, 2024 · 3 comments
Labels
bug Things that should work, but don’t triage These issues need to be reviewed by the Aleph team

Comments

@rs-develop
Copy link

Describe the bug
I deployed a productive instance of aleph. Specs: Ubuntu Server 22.04, 2 TB Space, 128 GB RAM and 8 Core CPU. I installed the latest release: 4.0.2. I am able to upload files up to 15 GB via UI and alephclient. Files which are bigger then 15 GB are not finishing the upload phase. Files which are bigger then 50 GB are leading to an never ending upload loop. No error at all.

To Reproduce
Steps to reproduce the behavior:

  1. Go to Investigation
  2. Click on Upload Document
  3. Upload a file with 25 GB of file size
  4. See error

Expected behavior
Show a reason why the upload went wrong. Show a hint before uploading "file is to big" or somethin like that. Show a hint what to do as an admin to enable the instance to handle such files.

Aleph version
4.0.2

Additional context
I checked the documentation, but found nothing helpfull. Is there any chance to change a environment variable, or a setting to enable my aleph instace to handle such big files?

@rs-develop rs-develop added bug Things that should work, but don’t triage These issues need to be reviewed by the Aleph team labels Dec 20, 2024
@simonwoerpel
Copy link
Contributor

How does your deployment look like? Is there a reverse proxy with e.g. a timeout in front of it? This looks more like a "bug" in the specific deployment setup, not necessarily in the app itself.

Another question would be, what kind of file is this? If a compressed archive, I'd suggest extracting it first before uploading it anyways.

@rs-develop
Copy link
Author

No, it is a local network, offline deployment. No reverse proxy, no firewall or other systems between. The files are uncompressed in CSV format. Even if the upload succeeds, and aleph is processing, after it is finished no files/data shows up. If I upload e.g. 500 lines of the big CSV files, the data do show up in the ui.

Can I somehow debug this?

@jigsawsecurity
Copy link

jigsawsecurity commented Jan 19, 2025

I am having a similar issue and had to flush the queue to get anything to process. It was like things were stuck. I'm wondering if there is a limit somewhere. Also cannot generate entities from CSV files that are large either. I'm experimenting to see if I can get a smaller file of the same format to actually show up and allow entity generation. We know large files can be supported if you look at the OCCRP instance as they have some pretty large CSV files that work just fine. what we don't know or have is any information on how to tune these instances to process larger files more efficiently. I have upped the instance size many times with no change and resources are not pegging so there has to be some limits somewhere else in the platform or the components that could be tweaked for performance.

Tired of seeing this screen where it never populates especitally since that's where the value of this platform lies in the entity extraction. The columns never load and it's not really all that big of a file.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Things that should work, but don’t triage These issues need to be reviewed by the Aleph team
Projects
None yet
Development

No branches or pull requests

3 participants