Skip to content

[fivetran] Revamp docs#5777

Open
koletzilla wants to merge 11 commits intomainfrom
clickhouse/improve-fivetran-docs
Open

[fivetran] Revamp docs#5777
koletzilla wants to merge 11 commits intomainfrom
clickhouse/improve-fivetran-docs

Conversation

@koletzilla
Copy link
Contributor

@koletzilla koletzilla commented Mar 18, 2026

Revamp Fivetran docs:

  • Index cover more details about the current version and related files
  • Reference covers all the docs we have in Fivetran side: configurations, table engine details...
    • I have not moved the full setup-guide, only the configurations details. The full guide should be kept complete in Fivetran side as it's really nice to have it when editing the connector.
  • Troubleshooting contains usual errors with possible solutions, good practices and some examples to help debugging.

@vercel
Copy link

vercel bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clickhouse-docs Ready Ready Preview, Comment Mar 24, 2026 10:08am
clickhouse-docs-jp Building Building Preview, Comment Mar 24, 2026 10:08am
3 Skipped Deployments
Project Deployment Actions Updated (UTC)
clickhouse-docs-ko Ignored Ignored Preview Mar 24, 2026 10:08am
clickhouse-docs-ru Ignored Ignored Preview Mar 24, 2026 10:08am
clickhouse-docs-zh Ignored Ignored Preview Mar 24, 2026 10:08am

Request Review

@koletzilla koletzilla force-pushed the clickhouse/improve-fivetran-docs branch from 74bf704 to 659f3d2 Compare March 20, 2026 18:41
@koletzilla koletzilla requested a review from BentsiLeviav March 20, 2026 18:45
@koletzilla koletzilla marked this pull request as ready for review March 20, 2026 18:45
@koletzilla koletzilla requested review from a team as code owners March 20, 2026 18:45
Copy link
Collaborator

@dhtclk dhtclk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor nits on voice and grammar, as well as the use of the note admonition.

koletzilla and others added 4 commits March 24, 2026 10:47
Co-authored-by: Dominic Tran <dominic.tran@clickhouse.com>
Co-authored-by: Dominic Tran <dominic.tran@clickhouse.com>
Co-authored-by: Dominic Tran <dominic.tran@clickhouse.com>
…oting.md

Co-authored-by: Dominic Tran <dominic.tran@clickhouse.com>
@koletzilla
Copy link
Contributor Author

Thanks for all these suggested changes @dhtclk 🙇

Comment on lines +94 to +95
| LOCALDATE | [Date](/sql-reference/data-types/date) |
| LOCALDATETIME | [DateTime](/sql-reference/data-types/datetime) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! clarifying it now

| STRING | [String](/sql-reference/data-types/string) |
| BINARY | [String](/sql-reference/data-types/string) \* |
| XML | [String](/sql-reference/data-types/string) \* |
| JSON | [String](/sql-reference/data-types/string) \* |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding it to the docs. Looks like we already have an issue to cover it implementation: ClickHouse/clickhouse-fivetran-destination#15

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have extended the issue description with more details


The Fivetran ClickHouse destination maps [Fivetran data types](https://fivetran.com/docs/destinations#datatypes) to ClickHouse types as follows:

| Fivetran type | ClickHouse type |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the date-related types, could you please verify that the range is aligned with Fivetran and the Go client? If it is not, I believe the range is silently replaced with the nearest boundary value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uops, yes, it's. I see some differences. Adding it to the docs

| `write_batch_size` | integer | `100000` | 5,000 – 100,000 | Number of rows per batch for insert, update, and replace operations. |
| `select_batch_size` | integer | `1500` | 200 – 1,500 | Number of rows per batch for SELECT queries used during updates. |
| `mutation_batch_size` | integer | `1500` | 200 – 1,500 | Number of rows per batch for ALTER TABLE UPDATE mutations in history mode. Lower it if you are experiencing large SQL statements. |
| `hard_delete_batch_size` | integer | `1500` | 200 – 1,500 | Number of rows per batch for hard delete operations in history mode. Lower it if you are experiencing large SQL statements. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see hard_delete_batch_size controls batch size for hard deletes in both standard syncs and history mode.

WriteBatch calls processDeleteFiles calls conn.HardDelete calls reader.ReadBatch(*flags.HardDeleteBatchSize)

Copy link
Contributor

@BentsiLeviav BentsiLeviav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments/questions.
In addition, can we go over the code and track errors we raise, and document them with an explanation in the troubleshooting guide? (if they are not here already)


`SharedReplacingMergeTree` performs background data deduplication
[only during merges at an unknown time](/engines/table-engines/mergetree-family/replacingmergetree).
However, selecting the latest version of the data without duplicates ad-hoc is possible with the `FINAL` keyword and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we warn that FINA can be significantly slower on large tables?
In addition, do you think we might be able to overcome duplicates using MV, not 100% sure, but something like:

  -- MV that feeds it
  CREATE MATERIALIZED VIEW analytics.orders_mv
  TO analytics.orders_latest
  AS SELECT *
  FROM fivetran_schema.orders
  WHERE _fivetran_deleted = false;


### Ensure cluster health during syncs {#cluster-health}

The Fivetran destination checks that all replicas are active before performing operations. If any replica is offline, operations fail after retrying for up to 600 seconds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC that was (partially) changed.

WaitAllNodesAvailable runs before mutations and is a monitoring/warning step. If replicas aren't all up, it logs a warning but does not block the operation. https://github.com/ClickHouse/clickhouse-fivetran-destination/blob/904aef0da7f0815d75032224f43f24a6ae80b6f0/destination/db/clickhouse.go#L946

WaitAllMutationsCompleted is the one that actually retries for up to 600s, but only when code 341 is received after the mutation itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants