Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
74bf704 to
659f3d2
Compare
dhtclk
left a comment
There was a problem hiding this comment.
Just a few minor nits on voice and grammar, as well as the use of the note admonition.
docs/integrations/data-ingestion/etl-tools/fivetran/reference.md
Outdated
Show resolved
Hide resolved
docs/integrations/data-ingestion/etl-tools/fivetran/troubleshooting.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Dominic Tran <dominic.tran@clickhouse.com>
Co-authored-by: Dominic Tran <dominic.tran@clickhouse.com>
Co-authored-by: Dominic Tran <dominic.tran@clickhouse.com>
…oting.md Co-authored-by: Dominic Tran <dominic.tran@clickhouse.com>
|
Thanks for all these suggested changes @dhtclk 🙇 |
| | LOCALDATE | [Date](/sql-reference/data-types/date) | | ||
| | LOCALDATETIME | [DateTime](/sql-reference/data-types/datetime) | |
There was a problem hiding this comment.
I see in the code that these two are targeting Date32 and DateTime64(0, 'UTC') (cover wider values):
https://github.com/ClickHouse/clickhouse-fivetran-destination/blob/6d60da9c7fad3e230871323ba371178565aa9a16/destination/common/data_types/data_types.go#L38
referring to https://github.com/ClickHouse/clickhouse-fivetran-destination/blob/6d60da9c7fad3e230871323ba371178565aa9a16/destination/common/constants/constants.go#L22
There was a problem hiding this comment.
Good point! clarifying it now
| | STRING | [String](/sql-reference/data-types/string) | | ||
| | BINARY | [String](/sql-reference/data-types/string) \* | | ||
| | XML | [String](/sql-reference/data-types/string) \* | | ||
| | JSON | [String](/sql-reference/data-types/string) \* | |
There was a problem hiding this comment.
I see Time type is missing (https://github.com/ClickHouse/clickhouse-fivetran-destination/blob/6d60da9c7fad3e230871323ba371178565aa9a16/destination/common/data_types/data_types.go#L54) - can you add it + create a follow-up issue in the repo to support it https://clickhouse.com/docs/sql-reference/data-types/time
There was a problem hiding this comment.
Adding it to the docs. Looks like we already have an issue to cover it implementation: ClickHouse/clickhouse-fivetran-destination#15
There was a problem hiding this comment.
I have extended the issue description with more details
|
|
||
| The Fivetran ClickHouse destination maps [Fivetran data types](https://fivetran.com/docs/destinations#datatypes) to ClickHouse types as follows: | ||
|
|
||
| | Fivetran type | ClickHouse type | |
There was a problem hiding this comment.
For all the date-related types, could you please verify that the range is aligned with Fivetran and the Go client? If it is not, I believe the range is silently replaced with the nearest boundary value
There was a problem hiding this comment.
uops, yes, it's. I see some differences. Adding it to the docs
| | `write_batch_size` | integer | `100000` | 5,000 – 100,000 | Number of rows per batch for insert, update, and replace operations. | | ||
| | `select_batch_size` | integer | `1500` | 200 – 1,500 | Number of rows per batch for SELECT queries used during updates. | | ||
| | `mutation_batch_size` | integer | `1500` | 200 – 1,500 | Number of rows per batch for ALTER TABLE UPDATE mutations in history mode. Lower it if you are experiencing large SQL statements. | | ||
| | `hard_delete_batch_size` | integer | `1500` | 200 – 1,500 | Number of rows per batch for hard delete operations in history mode. Lower it if you are experiencing large SQL statements. | |
There was a problem hiding this comment.
I see hard_delete_batch_size controls batch size for hard deletes in both standard syncs and history mode.
WriteBatch calls processDeleteFiles calls conn.HardDelete calls reader.ReadBatch(*flags.HardDeleteBatchSize)
BentsiLeviav
left a comment
There was a problem hiding this comment.
Left a few comments/questions.
In addition, can we go over the code and track errors we raise, and document them with an explanation in the troubleshooting guide? (if they are not here already)
|
|
||
| `SharedReplacingMergeTree` performs background data deduplication | ||
| [only during merges at an unknown time](/engines/table-engines/mergetree-family/replacingmergetree). | ||
| However, selecting the latest version of the data without duplicates ad-hoc is possible with the `FINAL` keyword and |
There was a problem hiding this comment.
Should we warn that FINA can be significantly slower on large tables?
In addition, do you think we might be able to overcome duplicates using MV, not 100% sure, but something like:
-- MV that feeds it
CREATE MATERIALIZED VIEW analytics.orders_mv
TO analytics.orders_latest
AS SELECT *
FROM fivetran_schema.orders
WHERE _fivetran_deleted = false;|
|
||
| ### Ensure cluster health during syncs {#cluster-health} | ||
|
|
||
| The Fivetran destination checks that all replicas are active before performing operations. If any replica is offline, operations fail after retrying for up to 600 seconds. |
There was a problem hiding this comment.
IIRC that was (partially) changed.
WaitAllNodesAvailable runs before mutations and is a monitoring/warning step. If replicas aren't all up, it logs a warning but does not block the operation. https://github.com/ClickHouse/clickhouse-fivetran-destination/blob/904aef0da7f0815d75032224f43f24a6ae80b6f0/destination/db/clickhouse.go#L946
WaitAllMutationsCompleted is the one that actually retries for up to 600s, but only when code 341 is received after the mutation itself.
Revamp Fivetran docs:
Indexcover more details about the current version and related filesReferencecovers all the docs we have in Fivetran side: configurations, table engine details...setup-guide, only the configurations details. The full guide should be kept complete in Fivetran side as it's really nice to have it when editing the connector.Troubleshootingcontains usual errors with possible solutions, good practices and some examples to help debugging.