Skip to content

Commit 972970e

Browse files
authored
Merge pull request #109 from Altinity/ashwini-ahire7-patch-6
Update delete-via-tombstone-column.md
2 parents b643d28 + 3446165 commit 972970e

File tree

1 file changed

+61
-17
lines changed

1 file changed

+61
-17
lines changed

content/en/altinity-kb-queries-and-syntax/delete-via-tombstone-column.md

Lines changed: 61 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@ linkTitle: "DELETE via tombstone column"
44
description: >
55
DELETE via tombstone column
66
---
7+
8+
This article provides an overview of the different methods to handle row deletion in ClickHouse, using tombstone columns and ALTER UPDATE or DELETE. The goal is to highlight the performance impacts of different techniques and storage settings, including a scenario using S3 for remote storage.
9+
10+
1. Creating a Test Table
11+
We will start by creating a simple MergeTree table with a tombstone column (is_active) to track active rows:
12+
713
```sql
814
CREATE TABLE test_delete
915
(
@@ -16,7 +22,10 @@ CREATE TABLE test_delete
1622
)
1723
ENGINE = MergeTree
1824
ORDER BY key;
19-
25+
```
26+
2. Inserting Data
27+
Insert sample data into the table:
28+
```sql
2029
INSERT INTO test_delete (key, ts, value_a, value_b, value_c) SELECT
2130
number,
2231
1,
@@ -25,8 +34,12 @@ INSERT INTO test_delete (key, ts, value_a, value_b, value_c) SELECT
2534
concat('string', toString(number))
2635
FROM numbers(10000000);
2736

28-
INSERT INTO test_delete (key, ts, value_a, value_b, value_c) VALUES (400000, 2, 'totally different string', 'another totally different string', 'last string');
2937

38+
INSERT INTO test_delete (key, ts, value_a, value_b, value_c) VALUES (400000, 2, 'totally different string', 'another totally different string', 'last string');
39+
```
40+
3. Querying the Data
41+
To verify the inserted data:
42+
```sql
3043
SELECT *
3144
FROM test_delete
3245
WHERE key = 400000;
@@ -37,31 +50,49 @@ WHERE key = 400000;
3750
┌────key─┬─ts─┬─value_a──────────────────┬─value_b────────────────┬─value_c──────┬─is_active─┐
3851
4000001 │ some_looong_string400000 │ another_long_str400000 │ string400000 │ 1
3952
└────────┴────┴──────────────────────────┴────────────────────────┴──────────────┴───────────┘
53+
```
54+
This should return two rows with different ts values.
55+
56+
4. Soft Deletion Using ALTER UPDATE
57+
Instead of deleting a row, you can mark it as inactive by setting is_active to 0:
58+
```sql
4059

4160
SET mutations_sync = 2;
4261

4362
ALTER TABLE test_delete
4463
UPDATE is_active = 0 WHERE (key = 400000) AND (ts = 1);
45-
4664
Ok.
4765

4866
0 rows in set. Elapsed: 0.058 sec.
49-
67+
```
68+
After updating, you can filter out inactive rows:
69+
```sql
5070
SELECT *
5171
FROM test_delete
52-
WHERE (key = 400000) AND is_active;
53-
54-
┌────key─┬─ts─┬─value_a──────────────────┬─value_b──────────────────────────┬─value_c─────┬─is_active─┐
55-
4000002 │ totally different string │ another totally different string │ last string │ 1
56-
└────────┴────┴──────────────────────────┴──────────────────────────────────┴─────────────┴───────────┘
72+
WHERE (key = 400000) AND is_active=0;
5773

74+
┌────key─┬─ts─┬─value_a──────────────────┬─value_b────────────────┬─value_c──────┬─is_active─┐
75+
4000001 │ some_looong_string400000 │ another_long_str400000 │ string400000 │ 0
76+
└────────┴────┴──────────────────────────┴────────────────────────┴──────────────┴───────────┘
77+
```
78+
5. Hard Deletion Using ALTER DELETE
79+
If you need to completely remove a row from the table, you can use ALTER DELETE:
80+
```sql
5881
ALTER TABLE test_delete
5982
DELETE WHERE (key = 400000) AND (ts = 1);
6083

6184
Ok.
6285

6386
0 rows in set. Elapsed: 1.101 sec. -- 20 times slower!!!
87+
```
88+
However, this operation is significantly slower compared to the ALTER UPDATE approach. For example:
89+
90+
ALTER DELETE: Takes around 1.1 seconds
91+
ALTER UPDATE: Only 0.05 seconds
6492

93+
The reason for this difference is that DELETE modifies the physical data structure, while UPDATE merely changes a column value.
94+
95+
```sql
6596
SELECT *
6697
FROM test_delete
6798
WHERE key = 400000;
@@ -70,7 +101,7 @@ WHERE key = 400000;
70101
4000002 │ totally different string │ another totally different string │ last string │ 1
71102
└────────┴────┴──────────────────────────┴──────────────────────────────────┴─────────────┴───────────┘
72103

73-
-- For ReplacingMergeTree
104+
-- For ReplacingMergeTree -> https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree
74105

75106
OPTIMIZE TABLE test_delete FINAL;
76107

@@ -87,7 +118,11 @@ WHERE key = 400000
87118
└────────┴────┴──────────────────────────┴──────────────────────────────────┴─────────────┴───────────┘
88119
```
89120

90-
## DELETE & S3
121+
Soft Deletion (via ALTER UPDATE): A quicker approach that does not involve physical data deletion but rather updates the tombstone column.
122+
Hard Deletion (via ALTER DELETE): Can take significantly longer, especially with large datasets stored in remote storage like S3.
123+
124+
6. Optimizing for Faster Deletion with S3 Storage
125+
If using S3 for storage, the DELETE operation becomes even slower due to the overhead of handling remote data. Here’s an example with a table using S3-backed storage:
91126

92127
```sql
93128
CREATE TABLE test_delete
@@ -120,28 +155,32 @@ SELECT count() FROM test_delete;
120155
1 row in set. Elapsed: 0.002 sec.
121156
```
122157

123-
### DELETE USING `ALTER UPDATE` & `ROW POLICY`
158+
7. DELETE Using ALTER UPDATE and Row Policy
159+
You can also control visibility at the query level using row policies. For example, to only show rows where is_active = 1:
160+
161+
To delete a row using ALTER UPDATE:
124162

125163
```sql
126-
CREATE ROW POLICY pol1 ON test_delete USING is_deleted=0 TO all;
164+
CREATE ROW POLICY pol1 ON test_delete USING is_active=1 TO all;
127165

128166
SELECT count() FROM test_delete; -- select count() became much slower, it reads data now, not metadata
129167
┌──count()─┐
130168
10000000
131169
└──────────┘
132170
1 row in set. Elapsed: 0.314 sec. Processed 10.00 million rows, 10.00 MB (31.84 million rows/s., 31.84 MB/s.)
133171

134-
ALTER TABLE test_delete UPDATE is_deleted = 1 WHERE (key = 400000) settings mutations_sync = 2;
172+
ALTER TABLE test_delete UPDATE is_active = 0 WHERE (key = 400000) settings mutations_sync = 2;
135173
0 rows in set. Elapsed: 1.256 sec.
136174

137-
138175
SELECT count() FROM test_delete;
139176
┌─count()─┐
140177
9999999
141178
└─────────┘
142179
```
180+
This impacts the performance of queries like SELECT count(), as ClickHouse now needs to scan data instead of reading metadata.
143181

144-
### DELETE USING `ALTER DELETE`
182+
8. DELETE Using ALTER DELETE - https://clickhouse.com/docs/en/sql-reference/statements/alter/delete
183+
To delete a row using ALTER DELETE:
145184

146185
```sql
147186
ALTER TABLE test_delete DELETE WHERE (key = 400001) settings mutations_sync = 2;
@@ -152,8 +191,10 @@ SELECT count() FROM test_delete;
152191
9999998
153192
└─────────┘
154193
```
194+
This operation may take significantly longer compared to soft deletions (around 955 seconds in this example for large datasets):
155195

156-
### DELETE USING `DELETE`
196+
9. DELETE Using DELETE Statement - https://clickhouse.com/docs/en/sql-reference/statements/delete
197+
The DELETE statement can also be used to remove data from a table:
157198

158199
```sql
159200
DELETE FROM test_delete WHERE (key = 400002);
@@ -164,3 +205,6 @@ SELECT count() FROM test_delete;
164205
9999997
165206
└─────────┘
166207
```
208+
This operation is faster, with an elapsed time of around 1.28 seconds in this case:
209+
210+
The choice between ALTER UPDATE and ALTER DELETE depends on your use case. For soft deletes, updating a tombstone column is significantly faster and easier to manage. However, if you need to physically remove rows, be mindful of the performance costs, especially with remote storage like S3.

0 commit comments

Comments
 (0)