Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unload don't create files while concurrency scaling enabled #231

Open
maxim-lisovsky-gismart opened this issue Sep 7, 2023 · 8 comments

Comments

@maxim-lisovsky-gismart
Copy link

When we enabled concurrency scaling in redshift locopy.redshift.unload_and_copy stopped creating files on S3 during critical load and logged the corresponding message:

No files generated from unload

Tried both parquet and csv formats

@fdosani
Copy link
Member

fdosani commented Sep 7, 2023

Could you provide more details? Like a minimal example of the code and stack trace etc. it's hard to debug just based on what you've provided. TIA!

@fdosani
Copy link
Member

fdosani commented Sep 7, 2023

Also do any of these limitations apply to you setup? https://docs.aws.amazon.com/redshift/latest/dg/concurrency-scaling.html

@maxim-lisovsky-gismart
Copy link
Author

I do something like this:

query = "select * from foo.bar"
with connection.get_redshift(secret_id, database=database, host=host) as redshift_locopy:
    redshift_locopy.unload_and_copy(
        query=query,
        s3_bucket=bucket,
        s3_folder=bucket_dir,
        export_path=False,
        raw_unload_path=data_dir,
        delete_s3_after=False,
        parallel_off=False,
        unload_options=["PARQUET", "PARALLEL ON", "CLEANPATH"],
    )

Haven't found any of these restrictions relevant for me

@fdosani
Copy link
Member

fdosani commented Sep 7, 2023

I'm not sure this is a locopy issue though. Sounds like concurrency scaling is changing the behaviour of redshift which means the data it is querying isn't available to download? I think one of those limitations is applying to your setup. It works without it enabled correct?

@maxim-lisovsky-gismart
Copy link
Author

Yes, it works without concurrency. I also have an idea that it can be caused by this:

SELECT path FROM stl_unload_log WHERE query = pg_last_query_id() ORDER BY path;

Probably last query id is not so reliable when it comes to concurrency

@fdosani
Copy link
Member

fdosani commented Sep 7, 2023

It say this in the docs:

It doesn't support queries that access system tables, PostgreSQL catalog tables, or no-backup tables.

That would make sense.

@maxim-lisovsky-gismart
Copy link
Author

maxim-lisovsky-gismart commented Sep 7, 2023

But it's locopy's code, unload_and_copy method

@fdosani
Copy link
Member

fdosani commented Sep 7, 2023

But it's locopy's code, unload_and_copy method

If you don't use locopy and have concurrency scaling enabled does your query work. Keep in mind I don't have your setup or your actual code, so I'm going off very little here to understand what is actually happening.

I'd suggest removing locopy from the testing and see if you can manually query/download the data, this would help me in tracking down where the issue is if it is in locopy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants