Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [Connector-V2] PostgreSQL JDBC Sink Fails to Handle 0x00 Null Byte in Source Data #8393

Open
3 tasks done
MyeoungDev opened this issue Dec 27, 2024 · 0 comments
Open
3 tasks done
Labels

Comments

@MyeoungDev
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

When the source data contains a 0x00 null byte, an error occurs during INSERT in the PostgreSQL JDBC Sink.
However, the same data is successfully inserted in other DBMS JDBC Sinks (Oracle, MySQL, MS-SQL).

This issue arises because PostgreSQL does not support 0x00 null bytes.

Currently, I am replacing the 0x00 null byte with an empty string ("") in the AbstractJdbcRowConverter to resolve this issue, and it works well in my environment.

Given SeaTunnel's goal of supporting compatibility across various DBMS, I believe this case should also be handled for PostgreSQL. I would like to hear your thoughts.

Reference
https://www.postgresql.org/docs/9.1/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE
https://www.postgresql.org/message-id/1171970019.3101.328.camel%40coppola.muc.ecircle.de

SeaTunnel Version

2.3.8

SeaTunnel Config

seatunnel:
  engine:
    history-job-expire-minutes: 1440
    backup-count: 1
    queue-type: blockingqueue
    print-execution-info-interval: 60
    print-job-metrics-info-interval: 60
    slot-service:
      dynamic-slot: true
    checkpoint:
      interval: 10000
      timeout: 60000
      storage:
        type: hdfs
        max-retained: 3
        plugin-config:
          namespace: /tmp/seatunnel/checkpoint_snapshot
          storage.type: hdfs
          fs.defaultFS: file:///tmp/ # Ensure that the directory has written permission
    telemetry:
      metric:
        enabled: true

Running Command

./bin/seatunnel-cluster.sh -d -r master

Error Exception

Caused by: java.sql.BatchUpdateException: Batch entry 427,213 INSERT INTO "seatunnel"."public"."xxx" ("xxx1", "xxx2", "xxx3", "xxx4", "xxx5", "xxx6", "xxx6", "xxx7", "xxx8", "xxx9", "xxx10", "xxx11", "xxx12", "xxx13", "xxx14", "xxx15", "xxx16", "xxx17", "xxx18", "xxx19") VALUES ('1', 'AY100     ', '20000701', 'xxx', 'xxx', '', 'xxx^@', '                    ', '1 ', '  ', '0', '20140604', '200701261757', 'MASTER  ', 'N', '         ', '   ', ' '
, NULL, NULL) was aborted: ERROR: invalid byte sequence for encoding "UTF8": 0x00  Call getNextException to see other errors in the batch.
        at org.postgresql.jdbc.BatchResultHandler.handleCompletion(BatchResultHandler.java:186) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:571) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.jdbc.PgStatement.internalExecuteBatch(PgStatement.java:893) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:916) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1684) ~[postgresql-42.5.4.jar:42.5.4]
        at org.apache.seatunnel.shade.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:127) ~[?:?]
        at org.apache.seatunnel.shade.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java) ~[?:?]
        at org.apache.seatunnel.connectors.seatunnel.jdbc.internal.executor.FieldNamedPreparedStatement.executeBatch(FieldNamedPreparedStatement.java:534) ~[?:?]
        at org.apache.seatunnel.connectors.seatunnel.jdbc.internal.executor.SimpleBatchStatementExecutor.executeBatch(SimpleBatchStatementExecutor.java:51) ~[?:?]
        at org.apache.seatunnel.connectors.seatunnel.jdbc.internal.executor.BufferedBatchStatementExecutor.executeBatch(BufferedBatchStatementExecutor.java:53) ~[?:?]
        at org.apache.seatunnel.connectors.seatunnel.jdbc.internal.JdbcOutputFormat.attemptFlush(JdbcOutputFormat.java:172) ~[?:?]
        at org.apache.seatunnel.connectors.seatunnel.jdbc.internal.JdbcOutputFormat.flush(JdbcOutputFormat.java:136) ~[?:?]
        ... 32 more
Caused by: org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2099) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.core.v3.QueryExecutorImpl.flushIfDeadlockRisk(QueryExecutorImpl.java:1456) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1481) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:546) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.jdbc.PgStatement.internalExecuteBatch(PgStatement.java:893) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:916) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1684) ~[postgresql-42.5.4.jar:42.5.4]
        at org.apache.seatunnel.shade.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:127) ~[?:?]
        at org.apache.seatunnel.shade.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java) ~[?:?]
        at org.apache.seatunnel.connectors.seatunnel.jdbc.internal.executor.FieldNamedPreparedStatement.executeBatch(FieldNamedPreparedStatement.java:534) ~[?:?]
        at org.apache.seatunnel.connectors.seatunnel.jdbc.internal.executor.SimpleBatchStatementExecutor.executeBatch(SimpleBatchStatementExecutor.java:51) ~[?:?]
        at org.apache.seatunnel.connectors.seatunnel.jdbc.internal.executor.BufferedBatchStatementExecutor.executeBatch(BufferedBatchStatementExecutor.java:53) ~[?:?]
        at org.apache.seatunnel.connectors.seatunnel.jdbc.internal.JdbcOutputFormat.attemptFlush(JdbcOutputFormat.java:172) ~[?:?]
        at org.apache.seatunnel.connectors.seatunnel.jdbc.internal.JdbcOutputFormat.flush(JdbcOutputFormat.java:136) ~[?:?]
        ... 32 more

Zeta or Flink or Spark Version

Zeta

Java or Scala Version

1.8

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@MyeoungDev MyeoungDev added the bug label Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant