Skip to content

Commit 6f9640f

Browse files
committed
durable execution post
1 parent 680a601 commit 6f9640f

File tree

1 file changed

+193
-0
lines changed

1 file changed

+193
-0
lines changed
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
---
2+
author: 'pws'
3+
title: 'Durable Executions with Coffee Grinders'
4+
description: 'Durable Executions in Clojure'
5+
category: 'clojure'
6+
layout: '../../layouts/BlogPost.astro'
7+
publishedDate: '2023-12-01'
8+
heroImage: 'roots.jpg'
9+
---
10+
11+
### Durable Executions
12+
13+
Tools like [Temporal](https://temporal.io/) and
14+
[Azure Durable Functions](https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-orchestrations?tabs=csharp-inproc)
15+
offer similar guarantees for distributed processes that database transactions offer for your data. Plus a better dev
16+
experience i.e. no yaml 😉.
17+
18+
These tools work by saving the execution history and replaying it when a process is restarted.
19+
This requires the workflow to be deterministic between runs. Various methods are provided to manage versioning as
20+
requirements evolve. Bounded history sizes add another concern to think about.
21+
22+
To me, they provide clear advantages when implementing reliable sagas or schedules but they come with associated costs
23+
such as infrastruture tie in and code evolution complexities.
24+
25+
What could this concept look like with a little clojure philosophy sprinkled on it?
26+
27+
### Coffee Grinders
28+
29+
[Coffee grinders](https://lambdaisland.com/blog/2020-03-29-coffee-grinders-2) are a generalization of
30+
[interceptors](https://lambdaisland.com/episodes/interceptors-concepts) where the process definition
31+
and process state are co-located.
32+
33+
With "code as data", the definition and the execution do not need to be seperate.
34+
35+
This togetherness guarantees atomic changes, providing the transactional outbox pattern for free.
36+
37+
It also makes it possible to pause and resume the process later, for example after a bug fix.
38+
39+
Here is the basic coffee grinder from the blog post:
40+
41+
```clojure
42+
(defn inc-counter [ctx] (update ctx :counter inc))
43+
44+
(defn run [{:keys [queue] :as ctx}]
45+
(if-let [[step & next-queue] (seq queue)]
46+
(recur (step (assoc ctx :queue next-queue)))
47+
ctx))
48+
49+
(run {:counter 0
50+
:queue [inc-counter inc-counter inc-counter]})
51+
52+
;;=> {:counter 3 :queue nil}
53+
```
54+
55+
### Making it durable
56+
57+
We can make the process durable by saving the context after each step.
58+
59+
Here I'm using an atom for simplicity but play along and imagine it is some database.
60+
61+
I also switch to using clojure's persistent queue because I like to pop things and I think they look cool when pretty printed 🤓.
62+
63+
```clojure
64+
(def history (atom []))
65+
66+
(? (defn inc-counter [ctx] (update ctx :counter inc)))
67+
68+
(fdat/register [inc-counter])
69+
70+
(defn run [{:keys [queue] :as ctx}]
71+
(if-let [step (peek queue)]
72+
(let [result (-> ctx step (update :queue pop))]
73+
(swap! history conj result)
74+
(recur result))
75+
ctx))
76+
77+
(run {:counter 0
78+
:queue (conj PersistentQueue/EMPTY inc-counter inc-counter inc-counter)})
79+
80+
(->> @history
81+
(map pp-fns) ; - just making the functions more readable in the output
82+
clojure.pprint/pprint) ; - makes queues look cool
83+
;; =>
84+
;; ({:counter 1, :queue <-("inc-counter" "inc-counter")-<}
85+
;; {:counter 2, :queue <-("inc-counter")-<}
86+
;; {:counter 3, :queue <-()-<}
87+
```
88+
89+
If your wondering what the [Fdat](https://github.com/helins/fdat.cljc) `?` macro is about, it adds a key to the metadata on `inc-counter`.
90+
I'm using this for now to make the queue more readable but it will come in handy later when we get to serialization.
91+
92+
### Resuming after an problem
93+
94+
If a step function returns an error, we won't update the persisted version, which opens up the possiblity for resuming the process after a fix is deployed.
95+
96+
```clojure
97+
(def latest-version (atom nil))
98+
99+
(? (defn bang [ctx] (assoc ctx :error true)))
100+
101+
(fdat/register [bang])
102+
103+
(defn run [{:keys [queue] :as ctx}]
104+
(if-let [step (peek queue)]
105+
(let [{:keys [error] :as result} (-> ctx step (update :queue pop))]
106+
(if error
107+
(println "Alert!") ; <- a cry for help
108+
(do
109+
(reset! latest-version result)
110+
(recur result))))
111+
(dissoc ctx :queue)))
112+
113+
(run {:counter 0
114+
:queue (conj PersistentQueue/EMPTY inc-counter inc-counter bang inc-counter)})
115+
116+
(->> @latest-version
117+
pp-fns
118+
clojure.pprint/pprint)
119+
;; => Alert!
120+
;; => {:counter 2, :queue <-("bang" "inc-counter")-<}
121+
```
122+
123+
We can fix `bang` and restart the process:
124+
125+
```clojure
126+
(defn bang [ctx] (assoc ctx :error false))
127+
128+
(run @latest-version)
129+
;; => Alert!
130+
;; => {:counter 2, :queue <-("bang" "inc-counter")-<} NOPE! still broken
131+
```
132+
133+
This retry doesn't work yet because the function on the queue has closed over the old definition.
134+
135+
However, if the context is round tripped through serialization (using transit here) it does work.
136+
137+
### Serialization
138+
139+
> Code is data, this axiom holds true until one has to serialize some functions.
140+
141+
Fdat isn't a requirement, another option would be to represents steps as vectors of events and their args (as ReFrame does).
142+
143+
But this is my post and I prefer to stick to good old functions.
144+
145+
The metadata added by `?` is used as a place holder during serialization to represent the function.
146+
When deserialized the new function definition is picked up which is why the following example now works.
147+
148+
```clojure
149+
(defn run [{:keys [queue] :as ctx}]
150+
(if-let [step (peek queue)]
151+
(let [{:keys [error] :as result} (-> ctx step (update :queue pop))]
152+
(if error
153+
(println "Alert!")
154+
(do
155+
(reset! latest-version (serialize result)) ; <- change
156+
(recur result))))
157+
(dissoc ctx :queue)))
158+
159+
(? (defn bang [ctx] (assoc ctx :error false))) ; <- bugfix here
160+
(fdat/register [bang])
161+
162+
(run (deserialize @latest-version))
163+
; => {:counter 3, :error false} Yay!
164+
```
165+
166+
There are still some limits to code evolution.
167+
A step function can't be deleted or have it's signature changed without the risk of breaking saved processes.
168+
I think this fits in with clojure philosophy. See Rich Hickey's view on [versioning](https://youtu.be/oyLBGkS5ICk?t=1335)
169+
170+
But as we have access to the steps in the context we can also change the workflow as needed. For example, skipping the broken step entirely:
171+
172+
```clojure
173+
(-> @latest-version
174+
deserialize
175+
(update :queue pop)
176+
run)
177+
;; => {:counter 3}
178+
```
179+
180+
### Conclusion
181+
182+
Using Coffee Grinders for durable execution has removed the need for a history log and determinism in the workflow definition.
183+
There are probably other concerns for durable exections that I have not addressed here. Ping me a message if you have any thoughts.
184+
185+
Further Possiblities:
186+
187+
- Throw in a scheduler to get durable timers for long running or repeating processes.
188+
- Add a worker queue to distribute the processing.
189+
- Use the interceptor pattern to apply saga compensations when a process fails.
190+
- Using different versions of step functions in various contexts e.g. in the browser or in tests.
191+
192+
193+
[gist](https://gist.github.com/peter-wilkins/98a2fe47aa5024b85cda7dbe5ace07f0)

0 commit comments

Comments
 (0)