|
| 1 | + |
| 2 | +# A simple Change Data Capture ([CDC]) extension. |
| 3 | + |
| 4 | +This extension will extract the [DML] changes from Postgres WAL |
| 5 | +using [Logical Decoding] and export them in JSON format using [serde]. |
| 6 | + |
| 7 | +## Principles |
| 8 | + |
| 9 | +* Postgres triggers various callbacks at the different stages of a transaction |
| 10 | +* The decoder defines some of these callbacks: begin, change, commit, etc. |
| 11 | +* The callbacks extract the changes made during the transaction |
| 12 | +* They build Rust structs (Action, Tuple) to represent those changes |
| 13 | +* The structs are then serialized into JSON |
| 14 | +* The JSON output is sent into a logical replication slot (i.e. a queue) |
| 15 | +* The output can be consumed in various ways by a remote client |
| 16 | + |
| 17 | +## Requirements |
| 18 | + |
| 19 | +In order to use this extension with a cargo-pgrx managed instance, you'll |
| 20 | +need to add the configuration below in "$PGRX_HOME/data-$PGVER/postgresql.conf". |
| 21 | + |
| 22 | +``` ini |
| 23 | +shared_preload_libraries = 'wal_decoder' |
| 24 | +wal_level = logical |
| 25 | +``` |
| 26 | + |
| 27 | +## Example |
| 28 | + |
| 29 | +1- Create a table and publish it |
| 30 | + |
| 31 | +``` sql |
| 32 | +CREATE TABLE person (name TEXT, age INT); |
| 33 | +ALTER TABLE person REPLICA IDENTITY FULL; |
| 34 | +CREATE PUBLICATION gotham_pub FOR TABLE person; |
| 35 | +``` |
| 36 | + |
| 37 | +2- Create a replication slot fed by the decoder |
| 38 | + |
| 39 | +``` sql |
| 40 | +SELECT pg_create_logical_replication_slot('gotham_slot', 'wal_decoder'); |
| 41 | +``` |
| 42 | + |
| 43 | +3- Consume the changes from the replication slot |
| 44 | + |
| 45 | +``` sql |
| 46 | +INSERT INTO person |
| 47 | +VALUES ('Bruce Wayne',42),('Clark Kent',33); |
| 48 | +``` |
| 49 | + |
| 50 | +``` sql |
| 51 | +SELECT * FROM pg_logical_slot_get_changes('gotham_slot', NULL, NULL); |
| 52 | + |
| 53 | + lsn | xid | data |
| 54 | +-----------+-----+------------------------------------------------------------------------------ |
| 55 | + 0/16A87C8 | 581 | {"typ":"BEGIN"} |
| 56 | + 0/16A87C8 | 581 | {"typ":"INSERT","rel":"public.person","new":{"name":"Bruce Wayne","age":42}} |
| 57 | + 0/16A8810 | 581 | {"typ":"INSERT","rel":"public.person","new":{"name":"Clark Kent","age":33}} |
| 58 | + 0/16A8888 | 581 | {"typ":"COMMIT","committed":779145498360779,"change_count":2} |
| 59 | +``` |
| 60 | + |
| 61 | +``` sql |
| 62 | +UPDATE person SET name = 'Batman' WHERE name= 'Bruce Wayne'; |
| 63 | +``` |
| 64 | + |
| 65 | +``` sql |
| 66 | +SELECT xid, jsonb_pretty(data::JSONB) |
| 67 | +FROM pg_logical_slot_get_changes('gotham_slot', NULL, NULL); |
| 68 | + |
| 69 | + xid | jsonb_pretty |
| 70 | +-----+----------------------------------- |
| 71 | + 587 | { + |
| 72 | + | "typ": "BEGIN" + |
| 73 | + | } |
| 74 | + 587 | { + |
| 75 | + | "new": { + |
| 76 | + | "age": 42, + |
| 77 | + | "name": "Batman" + |
| 78 | + | }, + |
| 79 | + | "old": { + |
| 80 | + | "age": 42, + |
| 81 | + | "name": "Bruce Wayne" + |
| 82 | + | }, + |
| 83 | + | "rel": "public.person", + |
| 84 | + | "typ": "UPDATE" + |
| 85 | + | } |
| 86 | + 587 | { + |
| 87 | + | "typ": "COMMIT", + |
| 88 | + | "committed": 779179731927669,+ |
| 89 | + | "change_count": 1 + |
| 90 | + | } |
| 91 | +``` |
| 92 | + |
| 93 | +## Limitations |
| 94 | + |
| 95 | +This decoder is designed as a basic example and it has the following limitations: |
| 96 | + |
| 97 | +* Only the REPLICA IDENTITY FULL mode is fully supported. Supporting REPLICA IDENTITY DEFAULT |
| 98 | + would require additional work. |
| 99 | + |
| 100 | +* Only TEXT and INT values are serialized. Supporting other types should be trivial. |
| 101 | + |
| 102 | + |
| 103 | +## Other WAL decoders |
| 104 | + |
| 105 | +Here are some other implementations in C that can be useful: |
| 106 | + |
| 107 | +* <https://github.com/dalibo/hackingpg/blob/main/journee5/audit/plugin_audit.c> |
| 108 | +* <https://github.com/leptonix/decoding-json/blob/master/decoding_json.c> |
| 109 | +* <https://github.com/michaelpq/pg_plugins/blob/main/decoder_raw/decoder_raw.c> |
| 110 | +* <https://github.com/eulerto/wal2json/blob/master/wal2json.c> |
| 111 | + |
| 112 | +<!-- Links --> |
| 113 | + |
| 114 | +[CDC]: https://en.wikipedia.org/wiki/Change_data_capture |
| 115 | +[DML]: https://en.wikipedia.org/wiki/Data_manipulation_language |
| 116 | +[Logical Decoding]: https://www.postgresql.org/docs/current/logicaldecoding-explanation.html |
| 117 | +[serde]: https://serde.rs |
| 118 | + |
0 commit comments