Unload don't create files while concurrency scaling enabled #231

maxim-lisovsky-gismart · 2023-09-07T09:49:00Z

When we enabled concurrency scaling in redshift locopy.redshift.unload_and_copy stopped creating files on S3 during critical load and logged the corresponding message:

No files generated from unload

Tried both parquet and csv formats

The text was updated successfully, but these errors were encountered:

fdosani · 2023-09-07T11:20:54Z

Could you provide more details? Like a minimal example of the code and stack trace etc. it's hard to debug just based on what you've provided. TIA!

fdosani · 2023-09-07T13:31:41Z

Also do any of these limitations apply to you setup? https://docs.aws.amazon.com/redshift/latest/dg/concurrency-scaling.html

maxim-lisovsky-gismart · 2023-09-07T13:58:05Z

I do something like this:

query = "select * from foo.bar"
with connection.get_redshift(secret_id, database=database, host=host) as redshift_locopy:
    redshift_locopy.unload_and_copy(
        query=query,
        s3_bucket=bucket,
        s3_folder=bucket_dir,
        export_path=False,
        raw_unload_path=data_dir,
        delete_s3_after=False,
        parallel_off=False,
        unload_options=["PARQUET", "PARALLEL ON", "CLEANPATH"],
    )

Haven't found any of these restrictions relevant for me

fdosani · 2023-09-07T14:02:56Z

I'm not sure this is a locopy issue though. Sounds like concurrency scaling is changing the behaviour of redshift which means the data it is querying isn't available to download? I think one of those limitations is applying to your setup. It works without it enabled correct?

maxim-lisovsky-gismart · 2023-09-07T14:13:34Z

Yes, it works without concurrency. I also have an idea that it can be caused by this:

SELECT path FROM stl_unload_log WHERE query = pg_last_query_id() ORDER BY path;

Probably last query id is not so reliable when it comes to concurrency

fdosani · 2023-09-07T14:15:32Z

It say this in the docs:

It doesn't support queries that access system tables, PostgreSQL catalog tables, or no-backup tables.

That would make sense.

maxim-lisovsky-gismart · 2023-09-07T14:43:00Z

But it's locopy's code, unload_and_copy method

fdosani · 2023-09-07T14:45:14Z

But it's locopy's code, unload_and_copy method

If you don't use locopy and have concurrency scaling enabled does your query work. Keep in mind I don't have your setup or your actual code, so I'm going off very little here to understand what is actually happening.

I'd suggest removing locopy from the testing and see if you can manually query/download the data, this would help me in tracking down where the issue is if it is in locopy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unload don't create files while concurrency scaling enabled #231

Unload don't create files while concurrency scaling enabled #231

maxim-lisovsky-gismart commented Sep 7, 2023

fdosani commented Sep 7, 2023

fdosani commented Sep 7, 2023

maxim-lisovsky-gismart commented Sep 7, 2023

fdosani commented Sep 7, 2023

maxim-lisovsky-gismart commented Sep 7, 2023

fdosani commented Sep 7, 2023

maxim-lisovsky-gismart commented Sep 7, 2023 •

edited

Loading

fdosani commented Sep 7, 2023 •

edited

Loading

Unload don't create files while concurrency scaling enabled #231

Unload don't create files while concurrency scaling enabled #231

Comments

maxim-lisovsky-gismart commented Sep 7, 2023

fdosani commented Sep 7, 2023

fdosani commented Sep 7, 2023

maxim-lisovsky-gismart commented Sep 7, 2023

fdosani commented Sep 7, 2023

maxim-lisovsky-gismart commented Sep 7, 2023

fdosani commented Sep 7, 2023

maxim-lisovsky-gismart commented Sep 7, 2023 • edited Loading

fdosani commented Sep 7, 2023 • edited Loading

maxim-lisovsky-gismart commented Sep 7, 2023 •

edited

Loading

fdosani commented Sep 7, 2023 •

edited

Loading