Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 6a18b34

Browse files
committedSep 18, 2023
add type-checking section
1 parent 5bd6102 commit 6a18b34

File tree

1 file changed

+151
-62
lines changed

1 file changed

+151
-62
lines changed
 

‎notebooks/2_intro_functiontask.md

Lines changed: 151 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,16 @@ jupytext:
44
extension: .md
55
format_name: myst
66
format_version: 0.13
7-
jupytext_version: 1.14.0
7+
jupytext_version: 1.15.0
88
kernelspec:
9-
display_name: Python 3
9+
display_name: Python 3 (ipykernel)
1010
language: python
1111
name: python3
1212
---
1313

1414
# FunctionTask
1515

16-
```{code-cell}
16+
```{code-cell} ipython3
1717
---
1818
jupyter:
1919
outputs_hidden: false
@@ -29,112 +29,203 @@ nest_asyncio.apply()
2929

3030
A `FunctionTask` is a `Task` that can be created from every *python* function by using *pydra* decorator: `pydra.mark.task`:
3131

32-
```{code-cell}
32+
```{code-cell} ipython3
3333
import pydra
3434
35-
3635
@pydra.mark.task
3736
def add_var(a, b):
3837
return a + b
3938
```
4039

4140
Once we decorate the function, we can create a pydra `Task` and specify the input:
4241

43-
```{code-cell}
44-
task1 = add_var(a=4, b=5)
42+
```{code-cell} ipython3
43+
task0 = add_var(a=4, b=5)
4544
```
4645

47-
We can check the type of `task1`:
46+
We can check the type of `task0`:
4847

49-
```{code-cell}
50-
type(task1)
48+
```{code-cell} ipython3
49+
type(task0)
5150
```
5251

5352
and we can check if the task has correct values of `a` and `b`, they should be saved in the task `inputs`:
5453

55-
```{code-cell}
56-
print(f'a = {task1.inputs.a}')
57-
print(f'b = {task1.inputs.b}')
54+
```{code-cell} ipython3
55+
print(f'a = {task0.inputs.a}')
56+
print(f'b = {task0.inputs.b}')
5857
```
5958

6059
We can also check content of entire `inputs`:
6160

62-
```{code-cell}
63-
task1.inputs
61+
```{code-cell} ipython3
62+
task0.inputs
6463
```
6564

6665
As you could see, `task.inputs` contains also information about the function, that is an inseparable part of the `FunctionTask`.
6766

6867
Once we have the task with set input, we can run it. Since `Task` is a "callable object", we can use the syntax:
6968

70-
```{code-cell}
71-
task1()
69+
```{code-cell} ipython3
70+
task0()
7271
```
7372

7473
As you can see, the result was returned right away, but we can also access it later:
7574

76-
```{code-cell}
77-
task1.result()
75+
```{code-cell} ipython3
76+
task0.result()
7877
```
7978

8079
`Result` contains more than just an output, so if we want to get the task output, we can type:
8180

82-
```{code-cell}
83-
result = task1.result()
81+
```{code-cell} ipython3
82+
result = task0.result()
8483
result.output.out
8584
```
8685

8786
And if we want to see the input that was used in the task, we can set an optional argument `return_inputs` to True.
8887

89-
```{code-cell}
90-
task1.result(return_inputs=True)
88+
```{code-cell} ipython3
89+
task0.result(return_inputs=True)
90+
```
91+
92+
## Type-checking
93+
94+
+++
95+
96+
### What is Type-checking?
97+
98+
Type-checking is verifying the type of a value at compile or run time. It ensures that operations or assignments to variables are semantically meaningful and can be executed without type errors, enhancing code reliability and maintainability.
99+
100+
+++
101+
102+
### Why Use Type-checking?
103+
104+
1. **Error Prevention**: Type-checking helps catch type mismatches early, preventing potential runtime errors.
105+
2. **Improved Readability**: Type annotations make understanding what types of values a function expects and returns more straightforward.
106+
3. **Better Documentation**: Explicitly stating expected types acts as inline documentation, simplifying code collaboration and review.
107+
4. **Optimized Performance**: Type-related optimizations can be made during compilation when types are explicitly specified.
108+
109+
+++
110+
111+
### How is Type-checking Implemented in Pydra?
112+
113+
+++
114+
115+
#### Static Type-Checking
116+
Static type-checking is done using Python's type annotations. You annotate the types of your function arguments and the return type and then use a tool like `mypy` to statically check if you're using the function correctly according to those annotations.
117+
118+
```{code-cell} ipython3
119+
@pydra.mark.task
120+
def add(a: int, b: int) -> int:
121+
return a + b
122+
```
123+
124+
```{code-cell} ipython3
125+
# This usage is correct according to static type hints:
126+
task1a = add(a=5, b=3)
127+
task1a()
128+
```
129+
130+
```{code-cell} ipython3
131+
# This usage is incorrect according to static type hints:
132+
task1b = add(a="hello", b="world")
133+
task1b()
134+
```
135+
136+
#### Dynamic Type-Checking
137+
138+
Dynamic type-checking is done at runtime. Add dynamic type checks if you want to enforce types when the function is executed.
139+
140+
```{code-cell} ipython3
141+
@pydra.mark.task
142+
def add(a, b):
143+
if not (isinstance(a, int) and isinstance(b, int)):
144+
raise TypeError("Both inputs should be integers.")
145+
return a + b
146+
```
147+
148+
```{code-cell} ipython3
149+
# This usage is correct and will not raise a runtime error:
150+
task1c = add(a=5, b=3)
151+
task1c()
152+
```
153+
154+
```{code-cell} ipython3
155+
# This usage is incorrect and will raise a runtime TypeError:
156+
task1d = add(a="hello", b="world")
157+
task1d()
158+
```
159+
160+
#### Checking Complex Types
161+
162+
For more complex types like lists, dictionaries, or custom objects, we can use type hints combined with dynamic checks.
163+
164+
```{code-cell} ipython3
165+
from typing import List, Tuple
166+
167+
@pydra.mark.task
168+
def sum_of_pairs(pairs: List[Tuple[int, int]]) -> List[int]:
169+
if not all(isinstance(pair, Tuple) and len(pair) == 2 for pair in pairs):
170+
raise ValueError("Input should be a list of pairs (tuples with 2 integers each).")
171+
return [sum(pair) for pair in pairs]
172+
```
173+
174+
```{code-cell} ipython3
175+
# Correct usage
176+
task1e = sum_of_pairs(pairs=[(1, 2), (3, 4)])
177+
task1e()
178+
```
179+
180+
```{code-cell} ipython3
181+
# This will raise a ValueError
182+
task1f = sum_of_pairs(pairs=[(1, 2), (3, "4")])
183+
task1f()
91184
```
92185

93186
## Customizing output names
94187
Note, that "out" is the default name for the task output, but we can always customize it. There are two ways of doing it: using *python* function annotation and using another *pydra* decorator:
95188

96189
Let's start from the function annotation:
97190

98-
```{code-cell}
191+
```{code-cell} ipython3
99192
import typing as ty
100193
101-
102194
@pydra.mark.task
103-
def add_var_an(a, b) -> {'sum_a_b': int}:
195+
def add_var_an(a: int, b: int) -> {'sum_a_b': int}:
104196
return a + b
105197
106198
107-
task1a = add_var_an(a=4, b=5)
108-
task1a()
199+
task2a = add_var_an(a=4, b=5)
200+
task2a()
109201
```
110202

111203
The annotation might be very useful to specify the output names when the function returns multiple values.
112204

113-
```{code-cell}
205+
```{code-cell} ipython3
114206
@pydra.mark.task
115-
def modf_an(a) -> {'fractional': ty.Any, 'integer': ty.Any}:
207+
def modf_an(a: float) -> {'fractional': ty.Any, 'integer': ty.Any}:
116208
import math
117209
118210
return math.modf(a)
119211
120212
121-
task2 = modf_an(a=3.5)
122-
task2()
213+
task2b = modf_an(a=3.5)
214+
task2b()
123215
```
124216

125217
The second way of customizing the output requires another decorator - `pydra.mark.annotate`
126218

127-
```{code-cell}
219+
```{code-cell} ipython3
128220
@pydra.mark.task
129221
@pydra.mark.annotate({'return': {'fractional': ty.Any, 'integer': ty.Any}})
130-
def modf(a):
222+
def modf(a: float):
131223
import math
132224
133225
return math.modf(a)
134226
135-
136-
task2a = modf(a=3.5)
137-
task2a()
227+
task2c = modf(a=3.5)
228+
task2c()
138229
```
139230

140231
**Note, that the order of the pydra decorators is important!**
@@ -145,7 +236,7 @@ task2a()
145236

146237
We don't have to provide the input when we create a task, we can always set it later:
147238

148-
```{code-cell}
239+
```{code-cell} ipython3
149240
task3 = add_var()
150241
task3.inputs.a = 4
151242
task3.inputs.b = 5
@@ -154,7 +245,7 @@ task3()
154245

155246
If we don't specify the input, `attr.NOTHING` will be used as the default value
156247

157-
```{code-cell}
248+
```{code-cell} ipython3
158249
task3a = add_var()
159250
task3a.inputs.a = 4
160251
@@ -166,7 +257,7 @@ task3a.inputs.b == attr.NOTHING
166257

167258
And if we try to run the task, an error will be raised:
168259

169-
```{code-cell}
260+
```{code-cell} ipython3
170261
:tags: [raises-exception]
171262
172263
task3a()
@@ -176,62 +267,61 @@ task3a()
176267

177268
After running the task, we can check where the output directory with the results was created:
178269

179-
```{code-cell}
270+
```{code-cell} ipython3
180271
task3.output_dir
181272
```
182273

183274
Within the directory you can find the file with the results: `_result.pklz`.
184275

185-
```{code-cell}
276+
```{code-cell} ipython3
186277
import os
187278
```
188279

189-
```{code-cell}
280+
```{code-cell} ipython3
190281
os.listdir(task3.output_dir)
191282
```
192283

193284
But we can also provide the path where we want to store the results. If a path is provided for the cache directory, then pydra will use the cached results of a node instead of recomputing the result. Let's create a temporary directory and a specific subdirectory "task4":
194285

195-
```{code-cell}
286+
```{code-cell} ipython3
196287
from tempfile import mkdtemp
197288
from pathlib import Path
198289
```
199290

200-
```{code-cell}
291+
```{code-cell} ipython3
201292
cache_dir_tmp = Path(mkdtemp()) / 'task4'
202293
print(cache_dir_tmp)
203294
```
204295

205296
Now we can pass this path to the argument of `FunctionTask` - `cache_dir`. To observe the execution time, we specify a function that is sleeping for 5s:
206297

207-
```{code-cell}
298+
```{code-cell} ipython3
208299
@pydra.mark.task
209-
def add_var_wait(a, b):
300+
def add_var_wait(a: int, b: int):
210301
import time
211302
212303
time.sleep(5)
213304
return a + b
214305
215-
216306
task4 = add_var_wait(a=4, b=6, cache_dir=cache_dir_tmp)
217307
```
218308

219309
If you're running the cell first time, it should take around 5s.
220310

221-
```{code-cell}
311+
```{code-cell} ipython3
222312
task4()
223313
task4.result()
224314
```
225315

226316
We can check `output_dir` of our task, it should contain the path of `cache_dir_tmp` and the last part contains the name of the task class `FunctionTask` and the task checksum:
227317

228-
```{code-cell}
318+
```{code-cell} ipython3
229319
task4.output_dir
230320
```
231321

232322
Let's see what happens when we defined identical task again with the same `cache_dir`:
233323

234-
```{code-cell}
324+
```{code-cell} ipython3
235325
task4a = add_var_wait(a=4, b=6, cache_dir=cache_dir_tmp)
236326
task4a()
237327
```
@@ -240,7 +330,7 @@ This time the result should be ready right away! *pydra* uses available results
240330

241331
*pydra* not only checks for the results in `cache_dir`, but you can provide a list of other locations that should be checked. Let's create another directory that will be used as `cache_dir` and previous working directory will be used in `cache_locations`.
242332

243-
```{code-cell}
333+
```{code-cell} ipython3
244334
cache_dir_tmp_new = Path(mkdtemp()) / 'task4b'
245335
246336
task4b = add_var_wait(
@@ -251,13 +341,13 @@ task4b()
251341

252342
This time the results should be also returned quickly! And we can check that `task4b.output_dir` was not created:
253343

254-
```{code-cell}
344+
```{code-cell} ipython3
255345
task4b.output_dir.exists()
256346
```
257347

258348
If you want to rerun the task regardless having already the results, you can set `rerun` to `True`. The task will take several seconds and new `output_dir` will be created:
259349

260-
```{code-cell}
350+
```{code-cell} ipython3
261351
cache_dir_tmp_new = Path(mkdtemp()) / 'task4c'
262352
263353
task4c = add_var_wait(
@@ -270,15 +360,15 @@ task4c.output_dir.exists()
270360

271361
If we update the input of the task, and run again, the new directory will be created and task will be recomputed:
272362

273-
```{code-cell}
363+
```{code-cell} ipython3
274364
task4b.inputs.a = 1
275365
print(task4b())
276366
print(task4b.output_dir.exists())
277367
```
278368

279369
and when we check the `output_dir`, we can see that it's different than last time:
280370

281-
```{code-cell}
371+
```{code-cell} ipython3
282372
task4b.output_dir
283373
```
284374

@@ -289,23 +379,22 @@ This is because, the checksum changes when we change either input or function.
289379
### Exercise 1
290380
Create a task that take a list of numbers as an input and returns two fields: `mean` with the mean value and `std` with the standard deviation value.
291381

292-
```{code-cell}
382+
```{code-cell} ipython3
293383
:tags: [hide-cell]
294384
295385
@pydra.mark.task
296386
@pydra.mark.annotate({'return': {'mean': ty.Any, 'std': ty.Any}})
297-
def mean_dev(my_list):
387+
def mean_dev(my_list: List):
298388
import statistics as st
299389
300390
return st.mean(my_list), st.stdev(my_list)
301391
302-
303392
my_task = mean_dev(my_list=[2, 2, 2])
304393
my_task()
305394
my_task.result()
306395
```
307396

308-
```{code-cell}
397+
```{code-cell} ipython3
309398
# write your solution here (you can use statistics module)
310399
```
311400

@@ -315,7 +404,7 @@ my_task.result()
315404

316405
`AuditFlag.RESOURCE` allows you to monitor resource usage for the `Task`, while `AuditFlag.PROV` tracks the provenance of the `Task`.
317406

318-
```{code-cell}
407+
```{code-cell} ipython3
319408
from pydra.utils.messenger import AuditFlag, PrintMessenger
320409
321410
task5 = add_var(a=4, b=5, audit_flags=AuditFlag.RESOURCE)
@@ -325,7 +414,7 @@ task5.result()
325414

326415
One can turn on both audit flags using `AuditFlag.ALL`, and print the messages on the terminal using the `PrintMessenger`.
327416

328-
```{code-cell}
417+
```{code-cell} ipython3
329418
task5 = add_var(
330419
a=4, b=5, audit_flags=AuditFlag.ALL, messengers=PrintMessenger()
331420
)

0 commit comments

Comments
 (0)
Please sign in to comment.