Skip to content

Commit 33a16ad

Browse files
authored
Merge pull request #915 from Altinity/add_feature_matrix_doc
Add feature matrix doc
2 parents c0980f3 + bf93dbc commit 33a16ad

File tree

4 files changed

+127
-0
lines changed

4 files changed

+127
-0
lines changed

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ transactional database tables in MySQL and PostgreSQL to ClickHouse
1515
for analysis.
1616

1717
## Features
18+
Refer [Feature Matrix](doc/feature_matrix.md) for detailed features.
1819

1920
* [Initial data dump and load(MySQL)](sink-connector/python/README.md)
2021
* Change data capture of new transactions using [Debezium](https://debezium.io/)
@@ -61,6 +62,7 @@ First two are good tutorials on MySQL and PostgreSQL respectively.
6162
* [Adding new tables(Incremental Snapshot)](doc/incremental_snapshot.md)
6263
* [Configuration](doc/configuration.md)
6364
* [State Storage](doc/state_storage.md)
65+
* [Data Type Mapping](doc/data_types.md)
6466

6567
### Operations
6668

@@ -72,6 +74,9 @@ First two are good tutorials on MySQL and PostgreSQL respectively.
7274
* [Development](doc/development.md)
7375
* [Testing](doc/TESTING.md)
7476

77+
## Comparison with other technologies
78+
- [Comparison](doc/comparison.md)
79+
7580
## Roadmap
7681

7782
[2024 Roadmap](https://github.com/Altinity/clickhouse-sink-connector/issues/401)

doc/comparison.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
| Feature | Altinity Sink Connector (Lightweight, Single Binary) | Airbyte | ClickHouse `mysql` Table Engine | Custom Python Script with ClickHouse Connect |
2+
|---------------------------------|------------------------------------------------------|--------------------------------|----------------------------------------|-----------------------------------------------|
3+
| **Replication Type** | Real-time CDC | Batch (Scheduled) | Direct Query | Batch or Scheduled |
4+
| **Data Freshness** | Near real-time | Configurable (e.g., hourly) | Near real-time (with latency) | Configurable |
5+
| **Schema Change Handling** | Full support(MySQL), Partial(PostgreSQL) | Manual schema refresh required | No automatic schema sync | Manual intervention needed |
6+
| **Complexity** | Low to Medium (single binary setup) | Moderate | Low | High (requires coding and scheduling) |
7+
| **Ease of Setup** | Easy (standalone binary, no Kafka needed) | Easy | Very easy | Complex (custom coding) |
8+
| **Maintenance** | Low to Moderate (single binary process) | Low | Low | High |
9+
| **Initial Sync Support** | Yes | Yes | Not applicable (direct query) | Yes |
10+
| **Transformation Capabilities** | Limited | Basic (Airbyte transformations)| No | Full control (custom code) |
11+
| **Cost** | Free or license-based | Free (Open-source) | Free (built-in to ClickHouse) | Free (but may require custom infrastructure) |
12+
| **Suitability for High Volume** | High | Medium | Medium | Medium to Low |
13+
| **Additional Infrastructure** | None | None | None | Optional (scheduling tools like Airflow) |
14+
| **Data Accuracy** | High (real-time CDC) | Medium (depends on sync frequency) | Medium | High |
15+
| **Ideal Use Case** | Low-latency, real-time replication without Kafka | Batch syncs, easy setup | Simple queries without replication | Custom, flexible ETL |
16+
17+
18+
| Feature | Altinity Sink Connector (Lightweight, Single Binary) | Airbyte |
19+
|---------------------------------|------------------------------------------------------|--------------------------------|
20+
|

doc/data_types.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
## MySQL Data Types
2+
Refer [Debezium](https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-supported-data-types) for detailed data types.
3+
4+
| MySQL | Debezium | ClickHouse |
5+
|--------------------|------------------------------------------------------|---------------------------------|
6+
| Bigint | INT64\_SCHEMA | Int64 |
7+
| Bigint Unsigned | INT64\_SCHEMA | UInt64 |
8+
| Blob | | String + hex |
9+
| Char | String | String / LowCardinality(String) |
10+
| Date | Schema: INT64<br>Name:<br>debezium.Date | Date(6) |
11+
| DateTime(0/1/2/3) | Schema: INT64<br>Name: debezium.Timestamp | DateTime64(0/1/2/3) |
12+
| DateTime(4/5/6) | Schema: INT64<br>Name: debezium.MicroTimestamp | DateTime64(4/5/6) |
13+
| Decimal(30,12) | Schema: Bytes<br>Name:<br>kafka.connect.data.Decimal | Decimal(30,12) |
14+
| Double | | Float64 |
15+
| Int | INT32 | Int32 |
16+
| Int Unsigned | INT64 | UInt32 |
17+
| Longblob | | String + hex |
18+
| Mediumblob | | String + hex |
19+
| Mediumint | INT32 | Int32 |
20+
| Mediumint Unsigned | INT32 | UInt32 |
21+
| Smallint | INT16 | Int16 |
22+
| Smallint Unsigned | INT32 | UInt16 |
23+
| Text | String | String |
24+
| Time | | String |
25+
| Time(6) | | String |
26+
| Timestamp | | DateTime64 |
27+
| Tinyint | INT16 | Int8 |
28+
| Tinyint Unsigned | INT16 | UInt8 |
29+
| varbinary(\*) | | String + hex |
30+
| varchar(\*) | | String |
31+
| JSON | | String |
32+
| BYTES | BYTES, io.debezium.bits | String |
33+
| YEAR | INT32 | INT32 |
34+
| GEOMETRY | Binary of WKB | String |
35+
| SET | | Array(String) |
36+
| ENUM | | Array(String) |
37+
38+
39+
### PostgreSQL Data Types
40+
41+
| PostgreSQL Type | Notes |
42+
|---------------------------|---------------------------------------------------------------------------------------|
43+
| `SMALLINT` | |
44+
| `INTEGER` | Supported |
45+
| `BIGINT` | Supported |
46+
| `NUMERIC` | Supported |
47+
| `REAL` | Supported |
48+
| `DOUBLE PRECISION` | Supported |
49+
| `BOOLEAN` | Supported |
50+
| `CHAR(n)` | Supported |
51+
| `VARCHAR(n)` | Supported |
52+
| `TEXT` | Supported |
53+
| `BYTEA` | Supported |
54+
| `DATE` | Supported |
55+
| `TIME [ WITHOUT TIME ZONE ]` | Supported |
56+
| `TIME WITH TIME ZONE` | Supported |
57+
| `TIMESTAMP [ WITHOUT TIME ZONE ]` | Supported |
58+
| `TIMESTAMP WITH TIME ZONE` | Supported |
59+
| `INTERVAL` | Supported |
60+
| `UUID` | Supported |
61+
| `INET` | Supported |
62+
| `MACADDR` | Supported |
63+
| `JSON` | Supported |
64+
| `JSONB` | Supported |
65+
| `HSTORE` | Supported |
66+
| `ENUM` | Supported |
67+
| `ARRAY` | Supported, but arrays of unsupported types are not supported |
68+
| `GEOMETRY` (PostGIS) | Not supported |
69+
| `GEOGRAPHY` (PostGIS) | Not supported |
70+
| `CITEXT` | Supported |
71+
| `BIT` | Not supported |
72+
| `BIT VARYING` | Not supported |
73+
| `MONEY` | Not supported |
74+
| `XML` | Not supported |
75+
| `OID` | Not supported |
76+
| `UNSUPPORTED` | Types other than those listed are not supported |

doc/feature_matrix.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
## Features
2+
3+
| Feature | Description |
4+
| ------- | --------- |
5+
| Single Binary | No additional dependencies or infrastructure required |
6+
| Exactly Once Processing| Offsets are committed to ClickHouse after the messages are written to ClickHouse |
7+
| Supported Databases | MySQL, MariaDB, PostgreSQL, MongoDB(Experimental) |
8+
| Supported ClickHouse Versions | 24.8 and above |
9+
| Clickhouse Tables Types | ReplacingMergeTree, MergeTree, ReplicatedReplacingMergeTree |
10+
| Replication Start positioning | Using sink-connector-client to start replication from a specific offset or LSN(MySQL Binlog Position, PostgreSQL LSN) |
11+
| Supported Datatypes| Refer [Datatypes](https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-supported-data-types) |
12+
| Initial Data load | Scripts to perform initial data load (MySQL) |
13+
| Fault Tolerance | Sink Connector Client to continue replication from the last committed offset/LSN in case of a failure |
14+
| Update, Delete | Supported with ReplacingMergeTree
15+
| Monitoring | Prometheus Metrics, Grafana Dashboard |
16+
| Schema Evolution| DDL support for MYSQL.
17+
| Deployment Models| Docker Compose, Java JAR file, Kubernetes
18+
| Start, Stop, Pause, Resume Replication | Supported using sink-connector-client
19+
| Filter sources databases, tables, columns | Supported using debezium configuration.
20+
| Map source databases to different ClickHouse databases | Database name overrides supported.
21+
| Column name overrides | Planned
22+
| MySQL extensive DDL support | Full list of DDL(sink-connector-lightweight/docs/mysql-ddl-support.md)
23+
| Replication Lag Monitoring| Grafana Dashboard and view to monitor lag
24+
| Batch inserts to ClickHouse | Configurable batch size/thread pool size to achieve high throughput/low latency
25+
| MySQL Generated/Alias/Materialized Columns | Supported
26+
| Auto create tables| Tables are automatically created in ClickHouse based on the source table structure.

0 commit comments

Comments
 (0)