Skip to content

Commit 8923a00

Browse files
committed
fix splitter error
1 parent b25d138 commit 8923a00

File tree

4 files changed

+95
-120
lines changed

4 files changed

+95
-120
lines changed

notebooks/3_intro_functiontask_state.md

Lines changed: 50 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ jupytext:
44
extension: .md
55
format_name: myst
66
format_version: 0.13
7-
jupytext_version: 1.14.0
7+
jupytext_version: 1.15.0
88
kernelspec:
9-
display_name: Python 3
9+
display_name: Python 3 (ipykernel)
1010
language: python
1111
name: python3
1212
---
@@ -17,7 +17,7 @@ Task might be run for a single set of input values or we can generate multiple s
1717

1818
Let's start from a simple `FunctionTask` that takes a list as an input:
1919

20-
```{code-cell}
20+
```{code-cell} ipython3
2121
---
2222
jupyter:
2323
outputs_hidden: false
@@ -31,7 +31,7 @@ import nest_asyncio
3131
nest_asyncio.apply()
3232
```
3333

34-
```{code-cell}
34+
```{code-cell} ipython3
3535
import pydra
3636
3737
@@ -45,52 +45,52 @@ task1 = add_two(x=[1, 2, 3])
4545

4646
Before we set any splitter, the task's `state` should be `None`
4747

48-
```{code-cell}
48+
```{code-cell} ipython3
4949
task1.state is None
5050
```
5151

52-
Now, we can set the `splitter` by using the `split` method. Since our task has only one input, there is only one option to create a set of inputs, i.e. `splitter="x"`:
52+
Now, we can set the `splitter` using the `split` method. Since our task has only one input, there is only one option to create a set of inputs, i.e. `split(splitter='x', x=[1, 2, 3])`; make sure you define the value of `x` in the splitter as you did in `task`:
5353

54-
```{code-cell}
55-
task1.split('x')
54+
```{code-cell} ipython3
55+
task1.split('x', x=[1, 2, 3])
5656
```
5757

5858
Now, we can check that our task has a `state`:
5959

60-
```{code-cell}
60+
```{code-cell} ipython3
6161
task1.state
6262
```
6363

6464
And we can print information about the state:
6565

66-
```{code-cell}
66+
```{code-cell} ipython3
6767
print(task1.state)
6868
```
6969

7070
within the `state` information about the splitter has been stored:
7171

72-
```{code-cell}
72+
```{code-cell} ipython3
7373
task1.state.splitter
7474
```
7575

7676
Note, that *pydra* adds name of the function to the name of the input.
7777

7878
Now, we can run the task and check results:
7979

80-
```{code-cell}
80+
```{code-cell} ipython3
8181
task1()
8282
task1.result()
8383
```
8484

8585
We can also return results together with values of the input, we just have to set an additional argument `return_inputs` to `True` (or `val`)
8686

87-
```{code-cell}
87+
```{code-cell} ipython3
8888
task1.result(return_inputs=True)
8989
```
9090

9191
If we want to return indices instead of values, we can set `return_inputs` to `ind`
9292

93-
```{code-cell}
93+
```{code-cell} ipython3
9494
task1.result(return_inputs='ind')
9595
```
9696

@@ -106,16 +106,16 @@ For tasks with a state *pydra* prepare all sets of inputs and run the task for e
106106

107107
We can also use `State` for functions with multiple inputs:
108108

109-
```{code-cell}
109+
```{code-cell} ipython3
110110
@pydra.mark.task
111111
def add_var(a, b):
112112
return a + b
113113
```
114114

115115
Now we have more options to define `splitter`, it depends on the type of inputs and on our application. For example, we could have `a` that is a list, `b` that is a single value, and split over `a` values:
116116

117-
```{code-cell}
118-
task2 = add_var(a=[1, 2, 3], b=10).split('a')
117+
```{code-cell} ipython3
118+
task2 = add_var(a=[1, 2, 3], b=10).split('a', a=[1, 2, 3])
119119
task2()
120120
task2.result()
121121
```
@@ -130,7 +130,7 @@ Now we have three results for each element from the `a` list and the value of `b
130130

131131
But we can have lists for both inputs, and use both inputs in the splitter. Let's assume that `a` and `b` are two elements lists.
132132

133-
```{code-cell}
133+
```{code-cell} ipython3
134134
task3 = add_var(a=[1, 2], b=[10, 100])
135135
```
136136

@@ -144,8 +144,8 @@ Now, we have two options to map the input values, we might want to run the task
144144

145145
Let's start from the scalar splitter, that uses parentheses in the syntax:
146146

147-
```{code-cell}
148-
task3.split(('a', 'b'))
147+
```{code-cell} ipython3
148+
task3.split(('a', 'b'), a=[1, 2], b=[10, 100])
149149
task3()
150150
task3.result()
151151
```
@@ -164,9 +164,9 @@ We can represent the execution by the graph:
164164

165165
For the outer splitter we will use brackets:
166166

167-
```{code-cell}
167+
```{code-cell} ipython3
168168
task4 = add_var(a=[1, 2], b=[10, 100])
169-
task4.split(['a', 'b'])
169+
task4.split(['a', 'b'], a=[1, 2], b=[10, 100])
170170
task4()
171171
task4.result()
172172
```
@@ -181,17 +181,17 @@ Now, we have results for all of the combinations of values from `a` and `b`.
181181

182182
Note, that once you set the splitter, you will get error when you try to set the splitter again. However, you can always set `overwrite` to `True` if you really intend to change the splitter.
183183

184-
```{code-cell}
184+
```{code-cell} ipython3
185185
:tags: [raises-exception]
186186
187-
task4.split(('a', 'b'))
187+
task4.split(('a', 'b'), a=[1, 2], b=[10, 100])
188188
```
189189

190190
For more inputs we can create more complex splitter, and use scalar and outer splitters together. **Note, that the scalar splitter can only work for lists that have the same length, but the outer splitter doesn't have this limitation.**
191191

192192
Let's run one more example that takes four inputs, `x` and `y` components of two vectors, and calculates all possible sums of vectors. `x` components should be kept together with corresponding `y` components (i.e. scalar splitters: `("x1", "y1")` and `("x2", "y2")`), but we should use outer splitter for two vectors to get all combinations.
193193

194-
```{code-cell}
194+
```{code-cell} ipython3
195195
@pydra.mark.task
196196
def add_vector(x1, y1, x2, y2):
197197
return (x1 + x2, y1 + y2)
@@ -205,7 +205,8 @@ task5 = add_vector(
205205
x2=[10, 20, 30],
206206
y2=[10, 20, 30],
207207
)
208-
task5.split(splitter=[('x1', 'y1'), ('x2', 'y2')])
208+
task5.split(splitter=[('x1', 'y1'), ('x2', 'y2')],
209+
x1=[10, 20], y1=[1, 2], x2=[10, 20, 30], y2=[10, 20, 30])
209210
task5()
210211
task5.result()
211212
```
@@ -220,9 +221,9 @@ When we use `splitter`, we can also define `combiner`, if we want to combine tog
220221

221222
If we take the `task4` as an example and combine all results for each element of the input `b`, we can modify the task as follows:
222223

223-
```{code-cell}
224+
```{code-cell} ipython3
224225
task5 = add_var(a=[1, 2], b=[10, 100])
225-
task5.split(['a', 'b'])
226+
task5.split(['a', 'b'], a=[1, 2], b=[10, 100])
226227
# adding combiner
227228
task5.combine('b')
228229
task5()
@@ -231,7 +232,7 @@ task5.result()
231232

232233
Now our result contains two elements, each one is a list. The first one contains results for `a=1` and both values of `b`, and the second contains results for `a=2` and both values of `b`. Let's print the result again using `return_inputs`:
233234

234-
```{code-cell}
235+
```{code-cell} ipython3
235236
all_results = task5.result(return_inputs=True)
236237
print(f'first list, a=1: {all_results[0]}')
237238
print(f'\n second list, a=2: {all_results[1]}')
@@ -243,9 +244,9 @@ print(f'\n second list, a=2: {all_results[1]}')
243244

244245
But we could also group all elements from the input `a` and have a different combined output:
245246

246-
```{code-cell}
247+
```{code-cell} ipython3
247248
task6 = add_var(a=[1, 2], b=[10, 100])
248-
task6.split(['a', 'b'])
249+
task6.split(['a', 'b'], a=[1, 2], b=[10, 100])
249250
# changing the combiner
250251
task6.combine('a')
251252
task6()
@@ -254,7 +255,7 @@ task6.result()
254255

255256
We still have two elements in our results, but this time the first element contains results for `b=10` and both values of `a`, and the second contains results for `b=100` and both values of `a`.
256257

257-
```{code-cell}
258+
```{code-cell} ipython3
258259
all_results = task6.result(return_inputs=True)
259260
print(f'first list, b=10: {all_results[0]}')
260261
print(f'\n second list, b=100: {all_results[1]}')
@@ -266,9 +267,9 @@ print(f'\n second list, b=100: {all_results[1]}')
266267

267268
We can also combine all elements by providing a list of all inputs to the `combiner`:
268269

269-
```{code-cell}
270+
```{code-cell} ipython3
270271
task7 = add_var(a=[1, 2], b=[10, 100])
271-
task7.split(['a', 'b'])
272+
task7.split(['a', 'b'], a=[1, 2], b=[10, 100])
272273
# combining all inputs
273274
task7.combine(['a', 'b'])
274275
task7()
@@ -287,7 +288,7 @@ This time the output contains one element that is a list of all outputs:
287288

288289
Note that list can be used as an input even without using any splitter, there are functions that take a list as a single input value:
289290

290-
```{code-cell}
291+
```{code-cell} ipython3
291292
@pydra.mark.task
292293
def moment(lst, n):
293294
return sum([i**n for i in lst]) / len(lst)
@@ -307,7 +308,7 @@ Let's say we want to calculate squares and cubes of integers from 2 to 5, and co
307308

308309
First we will define a function that returns powers:
309310

310-
```{code-cell}
311+
```{code-cell} ipython3
311312
:tags: [hide-cell]
312313
313314
@pydra.mark.task
@@ -317,17 +318,17 @@ def power(x, n):
317318

318319
Now we can create a task that takes two lists as its input, outer splitter for `x` and `n`, and combine all `x`:
319320

320-
```{code-cell}
321+
```{code-cell} ipython3
321322
:tags: [hide-cell]
322323
323-
task_ex1 = power(x=[2, 3, 4, 5], n=[2, 3]).split(['x', 'n']).combine('x')
324+
task_ex1 = power(x=[2, 3, 4, 5], n=[2, 3]).split(['x', 'n'], x=[2, 3, 4, 5], n=[2, 3]).combine('x')
324325
task_ex1()
325326
task_ex1.result()
326327
```
327328

328329
The result should contain two list, the first one is for squares, the second for cubes.
329330

330-
```{code-cell}
331+
```{code-cell} ipython3
331332
:tags: [hide-cell]
332333
333334
squares_list = [el.output.out for el in task_ex1.result()[0]]
@@ -340,7 +341,7 @@ print(f'cubes: {cubes_list}')
340341

341342
We run task multiple times for multiple sets of input, but we didn't talk about the execution time. Let's create a function that sleeps for a second and run for four values:
342343

343-
```{code-cell}
344+
```{code-cell} ipython3
344345
import time
345346
346347
@@ -350,7 +351,7 @@ def add_two_sleep(x):
350351
return x + 2
351352
352353
353-
task9 = add_two_sleep(x=[1, 2, 3, 4]).split('x')
354+
task9 = add_two_sleep(x=[1, 2, 3, 4]).split('x', x=[1, 2, 3, 4])
354355
t0 = time.time()
355356
task9()
356357
print(f'total time: {time.time() - t0}')
@@ -363,8 +364,8 @@ If we run `Task` that has a `State`, pydra will automatically create a `Submitte
363364

364365
We could also create a `Submitter` first, and than use it to run the task:
365366

366-
```{code-cell}
367-
task10 = add_two_sleep(x=[1, 2, 3, 4]).split('x')
367+
```{code-cell} ipython3
368+
task10 = add_two_sleep(x=[1, 2, 3, 4]).split('x', x=[1, 2, 3, 4])
368369
369370
t0 = time.time()
370371
with pydra.Submitter(plugin='cf') as sub:
@@ -375,8 +376,8 @@ print(f'results: {task10.result()}')
375376

376377
or we can provide the name of the plugin:
377378

378-
```{code-cell}
379-
task11 = add_two_sleep(x=[1, 2, 3, 4]).split('x')
379+
```{code-cell} ipython3
380+
task11 = add_two_sleep(x=[1, 2, 3, 4]).split('x', x=[1, 2, 3, 4])
380381
381382
t0 = time.time()
382383
task11(plugin='cf')
@@ -386,8 +387,8 @@ print(f'results: {task11.result()}')
386387

387388
The last option for running the task is to create a `Submitter` first and run the submitter (`Submitter` is also a callable object) with the task as a `runnable`:
388389

389-
```{code-cell}
390-
task12 = add_two_sleep(x=[1, 2, 3, 4]).split('x')
390+
```{code-cell} ipython3
391+
task12 = add_two_sleep(x=[1, 2, 3, 4]).split('x', x=[1, 2, 3, 4])
391392
392393
t0 = time.time()
393394
with pydra.Submitter(plugin='cf') as sub:
@@ -398,8 +399,8 @@ print(f'results: {task12.result()}')
398399

399400
All of the execution time should be similar, since all tasks are run by *pydra* in the same way, i.e. *pydra* creates a submitter with `ConcurrentFutures` worker, if a number of processors is not provided, `ConcurrentFutures` takes all available processors as `max_workers`. However, if we want to set a specific number of processors, we can set it using `n_procs` when creating a `Submitter`. Let's see how the execution time changes when we use `n_procs=2`.
400401

401-
```{code-cell}
402-
task13 = add_two_sleep(x=[1, 2, 3, 4]).split('x')
402+
```{code-cell} ipython3
403+
task13 = add_two_sleep(x=[1, 2, 3, 4]).split('x', x=[1, 2, 3, 4])
403404
404405
t0 = time.time()
405406
with pydra.Submitter(plugin='cf', n_procs=2) as sub:

0 commit comments

Comments
 (0)