Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed multiple typos here and there #101

Merged
merged 1 commit into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Welcome to the Altinity Knowledgebase Repository! This Knowledgebase was established for Altinity Engineers and ClickHouse community members to work together to find common solutions.

Submissions and merges to this repository are distrubuted at https://kb.altinity.com .
Submissions and merges to this repository are distributed at https://kb.altinity.com .

This knowledgebase is licensed under Apache 2.0. Contributors who submit to the Altinity Knowledgebase agree to the Altinity Contribution License Agreement.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ description: >-

## quantileTDigestState

quantileTDigestState is stored in two parts: a count of centroids in LEB128 format + list of centroids without a delimeter. Each centroid is represented as two Float32 values: Mean & Count.
quantileTDigestState is stored in two parts: a count of centroids in LEB128 format + list of centroids without a delimiter. Each centroid is represented as two Float32 values: Mean & Count.

```sql
SELECT
Expand Down
2 changes: 1 addition & 1 deletion content/en/altinity-kb-integrations/Spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The trivial & natural way to talk to ClickHouse from Spark is using jdbc. There

ClickHouse-Native-JDBC has some hints about integration with Spark even in the main README file.

'Official' driver does support some conversion of complex data types (Roarring bitmaps) for Spark-ClickHouse integration: https://github.com/ClickHouse/clickhouse-jdbc/pull/596
'Official' driver does support some conversion of complex data types (Roaring bitmaps) for Spark-ClickHouse integration: https://github.com/ClickHouse/clickhouse-jdbc/pull/596

But proper partitioning of the data (to spark partitions) may be tricky with jdbc.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ https://github.com/ClickHouse/ClickHouse/blob/da4856a2be035260708fe2ba3ffb9e437d

So it load the main config first, after that it load (with overwrites) the configs for all topics, **listed in `kafka_topic_list` of the table**.

Also since v21.12 it's possible to use more straght-forward way using named_collections:
Also since v21.12 it's possible to use more straightforward way using named_collections:
https://github.com/ClickHouse/ClickHouse/pull/31691

So you can say something like
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,5 +34,5 @@ You may want to adjust those depending on your scenario:

## Disable at-least-once delivery

`kafka_commit_every_batch` = 1 will change the loop logic mentioned above. Consumed batch commited to the Kafka and the block of rows send to Materialized Views only after that. It could be resembled as at-most-once delivery mode as prevent duplicate creation but allow loss of data in case of failures.
`kafka_commit_every_batch` = 1 will change the loop logic mentioned above. Consumed batch committed to the Kafka and the block of rows send to Materialized Views only after that. It could be resembled as at-most-once delivery mode as prevent duplicate creation but allow loss of data in case of failures.

Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ See also these configuration settings:
```
### About Offset Consuming

When a consumer joins the consumer group, the broker will check if it has a commited offset. If that is the case, then it will start from the latest offset. Both ClickHouse and librdKafka documentation state that the default value for `auto_offset_reset` is largest (or `latest` in new Kafka versions) but it is not, if the consumer is new:
When a consumer joins the consumer group, the broker will check if it has a committed offset. If that is the case, then it will start from the latest offset. Both ClickHouse and librdKafka documentation state that the default value for `auto_offset_reset` is largest (or `latest` in new Kafka versions) but it is not, if the consumer is new:

https://github.com/ClickHouse/ClickHouse/blob/f171ad93bcb903e636c9f38812b6aaf0ab045b04/src/Storages/Kafka/StorageKafka.cpp#L506

Expand Down
6 changes: 3 additions & 3 deletions content/en/altinity-kb-integrations/clickhouse-odbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Licensed under the [Apache 2.0](https://github.com/ClickHouse/clickhouse-odbc?ta
3. Configure ClickHouse DSN.

Note: that install driver linked against MDAC (which is default for Windows), some non-windows native
applications (cygwin / msys64 based) may require driver linked agains unixodbc. Build section below.
applications (cygwin / msys64 based) may require driver linked against unixodbc. Build section below.

### MacOS

Expand Down Expand Up @@ -69,7 +69,7 @@ The list of DSN parameters recognized by the driver is as follows:

## Troubleshooting & bug reporting

If some software doesn't work properly with that driver, but works good with other drivers - we will be appritiate if you will be able to collect debug info.
If some software doesn't work properly with that driver, but works good with other drivers - we will be appropriate if you will be able to collect debug info.

To debug issues with the driver, first things that need to be done are:
- enabling driver manager tracing. Links may contain some irrelevant vendor-specific details.
Expand Down Expand Up @@ -140,7 +140,7 @@ brew install git cmake make poco openssl libiodbc # You may use unixodbc INSTEAD

**Note:** usually on Linux you use unixODBC driver manager, and on Mac - iODBC.
In some (rare) cases you may need use other driver manager, please do it only
if you clearly understand the differencies. Driver should be used with the driver
if you clearly understand the differences. Driver should be used with the driver
manager it was linked to.

Clone the repo with submodules:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ description: >
* Since 22.8 - final doesn't read excessive data, see [https://github.com/ClickHouse/ClickHouse/pull/47801](https://github.com/ClickHouse/ClickHouse/pull/47801)
* Since 23.5 - final use less memory, see [https://github.com/ClickHouse/ClickHouse/pull/50429](https://github.com/ClickHouse/ClickHouse/pull/50429)
* Since 23.9 - final doesn't read PK columns if unneeded ie only one part in partition, see [https://github.com/ClickHouse/ClickHouse/pull/53919](https://github.com/ClickHouse/ClickHouse/pull/53919)
* Since 23.12 - final applied only for interesecting ranges of parts, see [https://github.com/ClickHouse/ClickHouse/pull/58120](https://github.com/ClickHouse/ClickHouse/pull/58120)
* Since 23.12 - final applied only for intersecting ranges of parts, see [https://github.com/ClickHouse/ClickHouse/pull/58120](https://github.com/ClickHouse/ClickHouse/pull/58120)
* Since 24.1 - final doesn't compare rows from the same part with level > 0, see [https://github.com/ClickHouse/ClickHouse/pull/58142](https://github.com/ClickHouse/ClickHouse/pull/58142)
* Since 24.1 - final use vertical algorithm, (more cache friendly), see [https://github.com/ClickHouse/ClickHouse/pull/54366](https://github.com/ClickHouse/ClickHouse/pull/54366)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ description: >
## ClickHouse® version 23.1+

(23.1.6.42, 23.2.5.46, 23.3.1.2823)
Have inbuild support for [parametrized views](https://clickhouse.com/docs/en/sql-reference/statements/create/view#parameterized-view):
Have inbuilt support for [parametrized views](https://clickhouse.com/docs/en/sql-reference/statements/create/view#parameterized-view):

```sql
CREATE VIEW my_new_view AS
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@ If that exception happens often in your use-case:
- use recent clickhouse versions
- ensure you use Atomic engine for the database (not Ordinary) (can be checked in system.databases)

Sometime you can try to workaround issue by finding the queries which uses that table concurenly (especially to system.tables / system.parts and other system tables) and try killing them (or avoiding them).
Sometime you can try to workaround issue by finding the queries which uses that table concurently (especially to system.tables / system.parts and other system tables) and try killing them (or avoiding them).
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: >-
Using array functions to mimic window-functions alike behavior.
---

There are some usecases when you may want to mimic window functions using Arrays - as an optimization step, or to contol the memory better / use on-disk spiling, or just if you have old ClickHouse® version.
There are some usecases when you may want to mimic window functions using Arrays - as an optimization step, or to control the memory better / use on-disk spiling, or just if you have old ClickHouse® version.

## Running difference sample

Expand Down Expand Up @@ -251,7 +251,7 @@ select id, val, runningDifference(val) from (select * from test_running_differen
### 1. Group & Collect the data into array


you can collect several column by builing array of tuples:
you can collect several column by building array of tuples:
```
SELECT
id,
Expand Down
2 changes: 1 addition & 1 deletion content/en/altinity-kb-queries-and-syntax/async-inserts.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ This has been improved in **ClickHouse 23.7** Flush queries for async inserts (t
## Versions

- **23.8** is a good version to start using async inserts because of the improvements and bugfixes.
- **24.3** the new adaptative timeout mechanism has been added so ClickHouse will throttle the inserts based on the server load.[#58486](https://github.com/ClickHouse/ClickHouse/pull/58486)
- **24.3** the new adaptive timeout mechanism has been added so ClickHouse will throttle the inserts based on the server load.[#58486](https://github.com/ClickHouse/ClickHouse/pull/58486)

## Metrics

Expand Down
2 changes: 1 addition & 1 deletion content/en/altinity-kb-queries-and-syntax/atomic-insert.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ https://github.com/ClickHouse/ClickHouse/issues/5148#issuecomment-487757235

### Generate test data in Native and TSV format ( 100 millions rows )

Text formats and Native format require different set of settings, here I want to find / demonstrate mandatory minumum of settings for any case.
Text formats and Native format require different set of settings, here I want to find / demonstrate mandatory minimum of settings for any case.

```bash
clickhouse-client -q \
Expand Down
2 changes: 1 addition & 1 deletion content/en/altinity-kb-queries-and-syntax/datetime64.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: >-
DateTime64 data type
---

## Substract fractional seconds
## Subtract fractional seconds

```sql
WITH toDateTime64('2021-09-07 13:41:50.926', 3) AS time
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ MemoryTracker: Peak memory usage (for query): 3.77 MiB.
```

* Multi threaded
* Will return result only after competion of aggregation
* Will return result only after completion of aggregation

## LIMIT BY

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ The ClickHouse® SimpleAggregateFunction can be used for those aggregations when
<tr>
<td style="text-align:left">reading raw value per row</td>
<td style="text-align:left">you can access it directly</td>
<td style="text-align:left">you need to use <code>finalizeAgggregation</code> function</td>
<td style="text-align:left">you need to use <code>finalizeAggregation</code> function</td>
</tr>
<tr>
<td style="text-align:left">using aggregated value</td>
Expand Down
2 changes: 1 addition & 1 deletion content/en/altinity-kb-queries-and-syntax/trace_log.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ By default it collects information only about queries when runs longer than 1 se
You can adjust that per query using settings `query_profiler_real_time_period_ns` & `query_profiler_cpu_time_period_ns`.

Both works very similar (with desired interval dump the stacktraces of all the threads which execute the query).
real timer - allows to 'see' the situtions when cpu was not working much, but time was spend for example on IO.
real timer - allows to 'see' the situations when cpu was not working much, but time was spend for example on IO.
cpu timer - allows to see the 'hot' points in calculations more accurately (skip the io time).

Trying to collect stacktraces with a frequency higher than few KHz is usually not possible.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Daily partitioning by toYYYYMMDD(timestamp) -> 20220602
TTL timestamp + INTERVAL 30 DAY MOVE TO DISK s3 -> TTL timestamp + INTERVAL 60 DAY MOVE TO DISK s3

* Idea: ClickHouse need to move data from s3 to local disk BACK
* Actual: There is no rule that data eariler than 60 DAY **should be** on local disk
* Actual: There is no rule that data earlier than 60 DAY **should be** on local disk

Table parts:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ TTL event_time + toIntervalMonth(1) RECOMPRESS CODEC(ZSTD(1)),
event_time + toIntervalMonth(6) RECOMPRESS CODEC(ZSTD(6);
```

Default comression is LZ4. See [the ClickHouse® documentation](https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings#server-settings-compression) for more information.
Default compression is LZ4. See [the ClickHouse® documentation](https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings#server-settings-compression) for more information.

These TTL rules recompress data after 1 and 6 months.

Expand All @@ -50,4 +50,4 @@ ALTER TABLE hits
event_time + toIntervalMonth(6) RECOMPRESS CODEC(ZSTD(6));
```

All columns have implicite default compression from server config, except `event_time`, that's why need to change to compression to `Default` for this column otherwise it won't be recompressed.
All columns have implicit default compression from server config, except `event_time`, that's why need to change to compression to `Default` for this column otherwise it won't be recompressed.
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Pros and cons:

### 2e Several 'baskets' of arrays

i.e.: timestamp, sourceid, metric_names_basket1, metric_values_basker1, ..., metric_names_basketN, metric_values_basketN
i.e.: timestamp, sourceid, metric_names_basket1, metric_values_basket1, ..., metric_names_basketN, metric_values_basketN
The same as 2b, but there are several key-value arrays ('basket'), and metric go to one particular basket depending on metric name (and optionally by metric type)

Pros and cons:
Expand Down
28 changes: 14 additions & 14 deletions content/en/altinity-kb-schema-design/codecs/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,21 @@ description: >
Codecs
---

| Codec Name | Recommended Data Types | Performance Notes |
|------------------|--------------------------------------|-------------------|
| LZ4 | Any | Used by default. Extremely fast; good compression; balanced speed and efficiency |
| ZSTD(level) | Any | Good compression; pretty fast; best for high compression needs. Don't use levels highter than 3. |
| LZ4HC(level) | Any | LZ4 High Compression algorithm with configurable level; slower but better compression than LZ4, but decmpression is still fast. |
| Delta | Integer Types, Time Series Data, Timestamps | Preprocessor (should be followed by some compression codec). Stores difference between neighboring values; good for monotonically increasing data. |
| DoubleDelta | Integer Types, Time Series Data | Stores difference between neighboring delta values; suitable for time series data |
| Gorilla | Floating Point Types | Calculates XOR between current and previous value; suitable for slowly changing numbers |
| T64 | Integer, Time Series Data, Timestamps | Preprocessor (should be followed by some compression codec). Crops unused high bits; puts them into a 64x64 bit matrix; optimized for 64-bit data types |
| GCD | Integer Numbers | Preprocessor (should be followed by some compression codec). Greatest common divisor compression; divides values by a common divisor; effective for divisible integer sequences |
| Codec Name | Recommended Data Types | Performance Notes |
|------------------|--------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| LZ4 | Any | Used by default. Extremely fast; good compression; balanced speed and efficiency |
| ZSTD(level) | Any | Good compression; pretty fast; best for high compression needs. Don't use levels higher than 3. |
| LZ4HC(level) | Any | LZ4 High Compression algorithm with configurable level; slower but better compression than LZ4, but decompression is still fast. |
| Delta | Integer Types, Time Series Data, Timestamps | Preprocessor (should be followed by some compression codec). Stores difference between neighboring values; good for monotonically increasing data. |
| DoubleDelta | Integer Types, Time Series Data | Stores difference between neighboring delta values; suitable for time series data |
| Gorilla | Floating Point Types | Calculates XOR between current and previous value; suitable for slowly changing numbers |
| T64 | Integer, Time Series Data, Timestamps | Preprocessor (should be followed by some compression codec). Crops unused high bits; puts them into a 64x64 bit matrix; optimized for 64-bit data types |
| GCD | Integer Numbers | Preprocessor (should be followed by some compression codec). Greatest common divisor compression; divides values by a common divisor; effective for divisible integer sequences |
| FPC | Floating Point Numbers | Designed for Float64; Algorithm detailed in [FPC paper](https://userweb.cs.txstate.edu/~burtscher/papers/dcc07a.pdf), [ClickHouse® PR #37553](https://github.com/ClickHouse/ClickHouse/pull/37553) |
| ZSTD_QAT | Any | Requires hardware support for QuickAssist Technology (QAT) hardware; provides accelerated compression tasks |
| DEFLATE_QPL | Any | Requires hardware support for Intel’s QuickAssist Technology for DEFLATE compression; enhanced performance for specific hardware |
| LowCardinality | String | It's not a codec, but a datatype modifier. Reduces representation size; effective for columns with low cardinality |
| NONE | Non-compressable data with very high entropy, like some random string, or some AggregateFunction states | No compression at all. Can be used on the columns that can not be compressed anyway. |
| ZSTD_QAT | Any | Requires hardware support for QuickAssist Technology (QAT) hardware; provides accelerated compression tasks |
| DEFLATE_QPL | Any | Requires hardware support for Intel’s QuickAssist Technology for DEFLATE compression; enhanced performance for specific hardware |
| LowCardinality | String | It's not a codec, but a datatype modifier. Reduces representation size; effective for columns with low cardinality |
| NONE | Non-compressable data with very high entropy, like some random string, or some AggregateFunction states | No compression at all. Can be used on the columns that can not be compressed anyway. |



Expand Down
Loading
Loading