Redshift batch inserts using COPY FROM operation #25866

brendanstennett · 2025-05-26T16:02:28Z

Description

This PR aims to allow the use of COPY FROM statements when sinking data into Redshift.

The Redshift connector inherits BaseJdbcConnector which uses batched INSERT statements to execute sink operations. Even when using non transactional mode, this can only push about 1000 rows per second. This change stages the rows to a parquet file first, then issues a COPY FROM statement to load the table. We are noticing 250K rows per second or more using this method.

This has been running in production for 2+ months on our own branch.

This functionality needs to be enabled by specifying the following config option:

redshift.batched-inserts-copy-location=s3://my-bucket/my-prefix

The following options are also required when specifying the above:

redshift.batched-inserts-copy-iam-role=arn:aws:iam::123456789000:role/redshift_iam_role
s3.region=region
s3.aws-access-key=KEY
s3.aws-secret-key=SECRET

A suggested IAM Policy to for this role and user:

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Action": [
				"s3:ListBucket"
			],
			"Resource": "arn:aws:s3:::my-bucket"
		},
		{
			"Effect": "Allow",
			"Action": [
				"s3:GetObject",
				"s3:PutObject",
				"s3:DeleteObject"
			],
			"Resource": "arn:aws:s3:::my-bucket/*"
		}
	]
}

Additional context and related issues

This is designed to work alongside and compliments the new Redshift UNLOAD feature Add support for fetching Redshift query results using Redshift unload command #24117
When the above configuration is not supplied, it will default to base JDBC behaviour.
There needs to be a new GHA var added to the repo REDSHIFT_S3_COPY_ROOT similar to REDSHIFT_S3_UNLOAD_ROOT

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Redshift
* Add support for Redshift COPY FROM statements for batch insert operations ({issue}`24546`)

brendanstennett · 2025-06-02T13:19:44Z

@ebyhr Added requested documentation. We need the ENV var set on the repo for the tests to pass.

REDSHIFT_S3_COPY_ROOT which would be a similar location to the ENV var already set of REDSHIFT_S3_UNLOAD_ROOT

github-actions · 2025-06-23T17:03:44Z

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

chenjian2664 · 2025-06-24T01:01:19Z

...rino-redshift/src/main/java/io/trino/plugin/redshift/RedshiftBatchedInsertsCopyPageSink.java

+            ConnectorPageSinkId pageSinkId,
+            TrinoFileSystemFactory fileSystemFactory,
+            TypeOperators typeOperators,
+            Location copyLocationWithPrefix,


How about rename to copyLocation

chenjian2664 · 2025-06-24T01:03:03Z

plugin/trino-redshift/src/main/java/io/trino/plugin/redshift/RedshiftConfig.java

+        return Optional.ofNullable(batchedInsertsCopyIamRole);
+    }
+
+    @Config("redshift.batched-inserts-copy-iam-role")


Do we need a toggle to control that we enable the COPY FROM or not, otherwise we only control it on session level?

chenjian2664 · 2025-06-24T01:04:23Z

docs/src/main/sphinx/connector/redshift.md

+
+:::
+
+Use the `batched_inserts_copy_enabled` [catalog session property](/sql/set-session) to


We should mention current implementation set the value to true by default

cla-bot bot added the cla-signed label May 26, 2025

github-actions bot added the redshift Redshift connector label May 26, 2025

brendanstennett force-pushed the redshift-copy branch 2 times, most recently from a2827db to 9187510 Compare May 26, 2025 18:31

ebyhr added the needs-docs This pull request requires changes to the documentation label May 26, 2025

brendanstennett changed the title ~~[WIP] Redshift batch inserts using COPY FROM operation~~ Redshift batch inserts using COPY FROM operation May 30, 2025

brendanstennett force-pushed the redshift-copy branch from 9187510 to 0fca059 Compare May 30, 2025 18:37

github-actions bot added the docs label May 30, 2025

Redshift batch inserts using COPY FROM operation

0d4c2da

brendanstennett force-pushed the redshift-copy branch from 0fca059 to 0d4c2da Compare June 2, 2025 13:17

brendanstennett marked this pull request as ready for review June 2, 2025 13:19

github-actions bot added the stale label Jun 23, 2025

chenjian2664 reviewed Jun 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Redshift batch inserts using COPY FROM operation #25866

Redshift batch inserts using COPY FROM operation #25866

Uh oh!

brendanstennett commented May 26, 2025 •

edited

Loading

Uh oh!

brendanstennett commented Jun 2, 2025

Uh oh!

github-actions bot commented Jun 23, 2025

Uh oh!

chenjian2664 Jun 24, 2025

Uh oh!

chenjian2664 Jun 24, 2025

Uh oh!

chenjian2664 Jun 24, 2025

Uh oh!

Uh oh!


		:::

		Use the `batched_inserts_copy_enabled` [catalog session property](/sql/set-session) to

Redshift batch inserts using COPY FROM operation #25866

Are you sure you want to change the base?

Redshift batch inserts using COPY FROM operation #25866

Uh oh!

Conversation

brendanstennett commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Additional context and related issues

Release notes

Uh oh!

brendanstennett commented Jun 2, 2025

Uh oh!

github-actions bot commented Jun 23, 2025

Uh oh!

chenjian2664 Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

chenjian2664 Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

chenjian2664 Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brendanstennett commented May 26, 2025 •

edited

Loading