You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/en/altinity-kb-queries-and-syntax/delete-via-tombstone-column.md
+61-17Lines changed: 61 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,12 @@ linkTitle: "DELETE via tombstone column"
4
4
description: >
5
5
DELETE via tombstone column
6
6
---
7
+
8
+
This article provides an overview of the different methods to handle row deletion in ClickHouse, using tombstone columns and ALTER UPDATE or DELETE. The goal is to highlight the performance impacts of different techniques and storage settings, including a scenario using S3 for remote storage.
9
+
10
+
1. Creating a Test Table
11
+
We will start by creating a simple MergeTree table with a tombstone column (is_active) to track active rows:
12
+
7
13
```sql
8
14
CREATETABLEtest_delete
9
15
(
@@ -16,7 +22,10 @@ CREATE TABLE test_delete
16
22
)
17
23
ENGINE = MergeTree
18
24
ORDER BY key;
19
-
25
+
```
26
+
2. Inserting Data
27
+
Insert sample data into the table:
28
+
```sql
20
29
INSERT INTO test_delete (key, ts, value_a, value_b, value_c) SELECT
Soft Deletion (via ALTER UPDATE): A quicker approach that does not involve physical data deletion but rather updates the tombstone column.
122
+
Hard Deletion (via ALTER DELETE): Can take significantly longer, especially with large datasets stored in remote storage like S3.
123
+
124
+
6. Optimizing for Faster Deletion with S3 Storage
125
+
If using S3 for storage, the DELETE operation becomes even slower due to the overhead of handling remote data. Here’s an example with a table using S3-backed storage:
91
126
92
127
```sql
93
128
CREATETABLEtest_delete
@@ -120,28 +155,32 @@ SELECT count() FROM test_delete;
120
155
1 row inset. Elapsed: 0.002 sec.
121
156
```
122
157
123
-
### DELETE USING `ALTER UPDATE` & `ROW POLICY`
158
+
7. DELETE Using ALTER UPDATE and Row Policy
159
+
You can also control visibility at the query level using row policies. For example, to only show rows where is_active = 1:
160
+
161
+
To delete a row using ALTER UPDATE:
124
162
125
163
```sql
126
-
CREATE ROW POLICY pol1 ON test_delete USING is_deleted=0 TO all;
164
+
CREATE ROW POLICY pol1 ON test_delete USING is_active=1 TO all;
127
165
128
166
SELECTcount() FROM test_delete; -- select count() became much slower, it reads data now, not metadata
129
167
┌──count()─┐
130
168
│ 10000000 │
131
169
└──────────┘
132
170
1 row inset. Elapsed: 0.314 sec. Processed 10.00 million rows, 10.00 MB (31.84 million rows/s., 31.84 MB/s.)
@@ -152,8 +191,10 @@ SELECT count() FROM test_delete;
152
191
│ 9999998 │
153
192
└─────────┘
154
193
```
194
+
This operation may take significantly longer compared to soft deletions (around 955 seconds in this example for large datasets):
155
195
156
-
### DELETE USING `DELETE`
196
+
9. DELETE Using DELETE Statement - https://clickhouse.com/docs/en/sql-reference/statements/delete
197
+
The DELETE statement can also be used to remove data from a table:
157
198
158
199
```sql
159
200
DELETEFROM test_delete WHERE (key =400002);
@@ -164,3 +205,6 @@ SELECT count() FROM test_delete;
164
205
│ 9999997 │
165
206
└─────────┘
166
207
```
208
+
This operation is faster, with an elapsed time of around 1.28 seconds in this case:
209
+
210
+
The choice between ALTER UPDATE and ALTER DELETE depends on your use case. For soft deletes, updating a tombstone column is significantly faster and easier to manage. However, if you need to physically remove rows, be mindful of the performance costs, especially with remote storage like S3.
0 commit comments