Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queries with large number of returned rows throws Gateway Timeout #89

Open
Curt-Whitmire-NOAA opened this issue Jun 2, 2021 · 1 comment
Labels
bug Something isn't working

Comments

@Curt-Whitmire-NOAA
Copy link

Email from @John-R-Wallace-NOAA,

Hi Data Team,

My downloading groundfish trawl bio data from the Triennial Shelf Survey
on the Data Warehouse seems to limited by the number of rows to download.

Here yelloweye rockfish (Sebastes ruberrimus) with 116 rows of data
does work (as do other species with a limited number of rows):

https://www.webapps.nwfsc.noaa.gov/data/api/v1/source/trawl.individual_fact/selection.json?filters=project=Groundfish%20Triennial%20Shelf%20Survey,station_invalid=0,operation_dim$is_assessment_acceptable=True,operation_dim$legacy_performance_code!=8,field_identified_taxonomy_dim$scientific_name=Sebastes%20ruberrimus&variables=project,trawl_id,station_code,common_name,scientific_name,year,vessel,pass,leg,tow,datetime_utc_iso,sampling_start_hhmmss,sampling_end_hhmmss,performance,target_station_design_dim$stn_invalid_for_trawl_date_whid,depth_m,weight_kg,length_cm,width_cm,sex,age_years,otosag_id,latitude_dd,longitude_dd

However, splitnose rockfish (Sebastes diploproa) with 6,242 rows of
data has never worked for me (I only know the number rows because I
downloaded the data by year):

https://www.webapps.nwfsc.noaa.gov/data/api/v1/source/trawl.individual_fact/selection.json?filters=project=Groundfish%20Triennial%20Shelf%20Survey,station_invalid=0,operation_dim$is_assessment_acceptable=True,operation_dim$legacy_performance_code!=8,field_identified_taxonomy_dim$scientific_name=Sebastes%20diploproa&variables=project,trawl_id,station_code,common_name,scientific_name,year,vessel,pass,leg,tow,datetime_utc_iso,sampling_start_hhmmss,sampling_end_hhmmss,performance,target_station_design_dim$stn_invalid_for_trawl_date_whid,depth_m,weight_kg,length_cm,width_cm,sex,age_years,otosag_id,latitude_dd,longitude_dd

Canary rockfish with 4,789 rows of data was successful early yesterday,
but of late does not work:

https://www.webapps.nwfsc.noaa.gov/data/api/v1/source/trawl.individual_fact/selection.json?filters=project=Groundfish%20Triennial%20Shelf%20Survey,station_invalid=0,operation_dim$is_assessment_acceptable=True,operation_dim$legacy_performance_code!=8,field_identified_taxonomy_dim$scientific_name=Sebastes%20pinniger&variables=project,trawl_id,station_code,common_name,scientific_name,year,vessel,pass,leg,tow,datetime_utc_iso,sampling_start_hhmmss,sampling_end_hhmmss,performance,target_station_design_dim$stn_invalid_for_trawl_date_whid,depth_m,weight_kg,length_cm,width_cm,sex,age_years,otosag_id,latitude_dd,longitude_dd

Changing to different browsers didn't help and I see the same issues in
R where I do most of my work. I see the issue with JSON, which I mostly
use, and with CSV when I have tested it.

At one point I needed to restart R for anything to work again, but after
a full system reboot the problems listed above persist. Using VPN or not
also does not appear to make a difference.

I have gotten both of these error messages:


Gateway Timeout
The proxy server did not receive a timely response from the upstream server.

Reference #1.4ff12417.1620780130.e663d01


This XML file does not appear to have any style information associated
with it. The document tree is shown below.

<title>500</title> Please try again

and others I didn't record.

In the past, I didn't realize I needed to have my R functions
distinguish between no data available to download (for a given species
and survey) and the server timing out. Which is not good at all, since
more data gives a greater chance of timing out and appearing at if there
is no data at all. Upon inspection, the long wait time for the server
to time out with no data retrieved was a clue that something was amiss.

I have a high speed connection with Comcast, without other issues to the
service of late, so that should not be the problem.

Thanks,

-John

@Curt-Whitmire-NOAA Curt-Whitmire-NOAA added the bug Something isn't working label Jun 2, 2021
@Curt-Whitmire-NOAA
Copy link
Author

@montsaroffNoaa and @KMSkeltonNOAA, narrowed down issue with filters:
For some reason, the filter elements operation_dim$is_assessment_acceptable=True and operation_dim$legacy_performance_code!=8 cause the code to run very slowly.

@Curt-Whitmire-NOAA found:
The filter element, operation_dim$is_assessment_acceptable=True in particular causes the code to run very slowly. I've tried all combinations of the value (true, True, TRUE, 1), but to no avail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant