Skip to content

Commit

Permalink
Add an example of a WAL decoder (#1845)
Browse files Browse the repository at this point in the history
This example shows how to build a basic Change Data Capture (CDC)
mechanism using the Postgres Logical Decoding capabilities. The changes
occurring on the database are serialized into JSON and pushed to a queue
(a "logical replication slot") where they can be consumed by remote
clients.

A Postgres CDC extension can be a used in various purposes:

* Ad hoc replication systems (e.g. Postgres => SQL Server )
* External commit-log for a distributed system (such as Kafka)
* Advanced Monitoring ( e.g. Prometheus/Loki )

This example tries to find the right tradeoff between simplicity and
usefulness. Currently it has strong limitations but they should be easy
to overcome.

Rust really shines in this example compared to similar implementations
in C (see the wal2json extension). The Serde crate provides JSON
serialization out of the box, whereas all the C implementations are
forced to write their own JSON formatter.

Co-authored-by: damien <[email protected]>
  • Loading branch information
daamien and damien authored Sep 12, 2024
1 parent b810e98 commit 729e70f
Show file tree
Hide file tree
Showing 7 changed files with 490 additions and 1 deletion.
3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
#LICENSE All rights reserved.
#LICENSE
#LICENSE Use of this source code is governed by the MIT license that can be found in the LICENSE file.

[workspace]
resolver = "2"
members = [
Expand All @@ -32,6 +32,7 @@ exclude = [
"pgrx-examples/custom_types",
"pgrx-examples/custom_sql",
"pgrx-examples/datetime",
"pgrx-examples/wal_decoder",
"pgrx-examples/errors",
"pgrx-examples/nostd",
"pgrx-examples/numeric",
Expand Down
6 changes: 6 additions & 0 deletions pgrx-examples/wal_decoder/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.DS_Store
.idea/
/target
*.iml
**/*.rs.bk
Cargo.lock
38 changes: 38 additions & 0 deletions pgrx-examples/wal_decoder/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
[package]
name = "wal_decoder"
version = "0.0.0"
edition = "2021"

[lib]
crate-type = ["cdylib", "lib"]

[[bin]]
name = "pgrx_embed_wal_decoder"
path = "./src/bin/pgrx_embed.rs"

[features]
default = ["pg13"]
pg12 = ["pgrx/pg12", "pgrx-tests/pg12" ]
pg13 = ["pgrx/pg13", "pgrx-tests/pg13" ]
pg14 = ["pgrx/pg14", "pgrx-tests/pg14" ]
pg15 = ["pgrx/pg15", "pgrx-tests/pg15" ]
pg16 = ["pgrx/pg16", "pgrx-tests/pg16" ]
pg17 = ["pgrx/pg17", "pgrx-tests/pg17" ]
pg_test = []

[dependencies]
pgrx = { path = "../../pgrx", default-features = false }
serde = "1.0.209"
serde_json = "1.0.128"

[dev-dependencies]
pgrx-tests = { path = "../../pgrx-tests" }

[profile.dev]
panic = "unwind"

[profile.release]
panic = "unwind"
opt-level = 3
lto = "fat"
codegen-units = 1
118 changes: 118 additions & 0 deletions pgrx-examples/wal_decoder/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@

# A simple Change Data Capture ([CDC]) extension.

This extension will extract the [DML] changes from Postgres WAL
using [Logical Decoding] and export them in JSON format using [serde].

## Principles

* Postgres triggers various callbacks at the different stages of a transaction
* The decoder defines some of these callbacks: begin, change, commit, etc.
* The callbacks extract the changes made during the transaction
* They build Rust structs (Action, Tuple) to represent those changes
* The structs are then serialized into JSON
* The JSON output is sent into a logical replication slot (i.e. a queue)
* The output can be consumed in various ways by a remote client

## Requirements

In order to use this extension with a cargo-pgrx managed instance, you'll
need to add the configuration below in "$PGRX_HOME/data-$PGVER/postgresql.conf".

``` ini
shared_preload_libraries = 'wal_decoder'
wal_level = logical
```

## Example

1- Create a table and publish it

``` sql
CREATE TABLE person (name TEXT, age INT);
ALTER TABLE person REPLICA IDENTITY FULL;
CREATE PUBLICATION gotham_pub FOR TABLE person;
```

2- Create a replication slot fed by the decoder

``` sql
SELECT pg_create_logical_replication_slot('gotham_slot', 'wal_decoder');
```

3- Consume the changes from the replication slot

``` sql
INSERT INTO person
VALUES ('Bruce Wayne',42),('Clark Kent',33);
```

``` sql
SELECT * FROM pg_logical_slot_get_changes('gotham_slot', NULL, NULL);

lsn | xid | data
-----------+-----+------------------------------------------------------------------------------
0/16A87C8 | 581 | {"typ":"BEGIN"}
0/16A87C8 | 581 | {"typ":"INSERT","rel":"public.person","new":{"name":"Bruce Wayne","age":42}}
0/16A8810 | 581 | {"typ":"INSERT","rel":"public.person","new":{"name":"Clark Kent","age":33}}
0/16A8888 | 581 | {"typ":"COMMIT","committed":779145498360779,"change_count":2}
```

``` sql
UPDATE person SET name = 'Batman' WHERE name= 'Bruce Wayne';
```

``` sql
SELECT xid, jsonb_pretty(data::JSONB)
FROM pg_logical_slot_get_changes('gotham_slot', NULL, NULL);

xid | jsonb_pretty
-----+-----------------------------------
587 | { +
| "typ": "BEGIN" +
| }
587 | { +
| "new": { +
| "age": 42, +
| "name": "Batman" +
| }, +
| "old": { +
| "age": 42, +
| "name": "Bruce Wayne" +
| }, +
| "rel": "public.person", +
| "typ": "UPDATE" +
| }
587 | { +
| "typ": "COMMIT", +
| "committed": 779179731927669,+
| "change_count": 1 +
| }
```

## Limitations

This decoder is designed as a basic example and it has the following limitations:

* Only the REPLICA IDENTITY FULL mode is fully supported. Supporting REPLICA IDENTITY DEFAULT
would require additional work.

* Only TEXT and INT values are serialized. Supporting other types should be trivial.


## Other WAL decoders

Here are some other implementations in C that can be useful:

* <https://github.com/dalibo/hackingpg/blob/main/journee5/audit/plugin_audit.c>
* <https://github.com/leptonix/decoding-json/blob/master/decoding_json.c>
* <https://github.com/michaelpq/pg_plugins/blob/main/decoder_raw/decoder_raw.c>
* <https://github.com/eulerto/wal2json/blob/master/wal2json.c>

<!-- Links -->

[CDC]: https://en.wikipedia.org/wiki/Change_data_capture
[DML]: https://en.wikipedia.org/wiki/Data_manipulation_language
[Logical Decoding]: https://www.postgresql.org/docs/current/logicaldecoding-explanation.html
[serde]: https://serde.rs

1 change: 1 addition & 0 deletions pgrx-examples/wal_decoder/src/bin/pgrx_embed.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::pgrx::pgrx_embed!();
Loading

0 comments on commit 729e70f

Please sign in to comment.