-
Notifications
You must be signed in to change notification settings - Fork 11
/
case_filters.Rmd
689 lines (490 loc) · 24.6 KB
/
case_filters.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
---
title: "bupaR Docs | Filter case"
---
```{r echo = F, out.width="25%", fig.align = "right"}
knitr::include_graphics("images/icons/filter.png")
```
***
# Case filters
```{r include = F}
library(bupaverse)
```
```{r eval = F}
library(bupaverse)
```
## Activity presence {.tabset .tabset-pills}
Use`filter_activity_presence()` to select cases that contain a specific activity, for instance an X-Ray scan. The function returns a `log` object. For the illustration purposes [`traces()`](inspect_logs.html) is also used.
```{r}
patients %>%
filter_activity_presence("X-Ray") %>%
traces()
```
Or that don't have a specific activity, using `reserve = TRUE`.
```{r}
patients %>%
filter_activity_presence("X-Ray", reverse = TRUE) %>%
traces()
```
We can also specify more than one activity. In this case, the `method` argument can be configured as follows:
* "all" means that all the specified activity labels must be present for a case to be selected.
* "none" means that they are not allowed to be present.
* "one_of" means that at least one of them must be present.
* "exact" means that all of these activities have to be present (although multiple times and in random orderings), while no others are allowed.
* "only" means that only (a set of) these activities are allowed to be present, and no others.
Below an illustration of these different options for the activities _Create Fine_ and _Payment_ from `traffic_fines`. Note that the unfiltered dataset has 44 distinct traces.
<style>
div.blue { background-color:#E6F4F1; border-radius: 5px; padding: 20px;}
</style>
### All
<div class = "blue">
27 traces have both activities.
```{r}
traffic_fines %>%
filter_activity_presence(c("Create Fine", "Payment"), method = "all") %>%
traces()
```
</div>
### None
<div class = "blue">
No traces exist that have none of these activities.
```{r}
traffic_fines %>%
filter_activity_presence(c("Create Fine", "Payment"), method = "none") %>%
traces()
```
</div>
### One of
<div class = "blue">
All 44 traces have at least one of these activities.
```{r}
traffic_fines %>%
filter_activity_presence(c("Create Fine", "Payment"), method = "one_of") %>%
traces()
```
</div>
### Exact
<div class = "blue">
Only 2 traces consist of exactly these activities.
```{r}
traffic_fines %>%
filter_activity_presence(c("Create Fine", "Payment"), method = "exact") %>%
traces()
```
</div>
### Only
<div class = "blue">
And the same 2 traces have only these activities.
```{r}
traffic_fines %>%
filter_activity_presence(c("Create Fine", "Payment"), method = "only") %>%
traces()
```
</div>
## {.unlisted .unnumbered}
Note that when one of the specified activities cannot be found in the log, you will get a warning about this. However, `filter_activity_presence()` will proceed with the specified list in any case. The result below shows that no trace has the activity "Create Fines".
```{r}
traffic_fines %>%
filter_activity_presence(c("Create Fines"), method = "none") %>%
traces()
```
## Case
`filter_case()` can be used to filter cases based on their identifier. It returns the same `log` object containing events with the specified cases.
```{r}
traffic_fines %>%
filter_case(cases = c("A1","A2"))
```
The selection can be reversed with `reverse = TRUE`.
```{r}
traffic_fines %>%
filter_case(cases = c("A1","A2"), reverse = TRUE)
```
## Case Condition
`filter_case_condition()` can be used to select cases for which a condition holds. This condition can be related to any of the variables in the log.
For example, select all cases where _resource_ 561 is involved.
```{r}
traffic_fines %>%
filter_case_condition(resource == 561)
```
Note that multiple conditions can be combined using the symbols `|` (or) and `&` (and). For example, let's select all cases where _resource_ 557 is involved, and the _points_ are more than 0.
```{r}
traffic_fines %>%
filter_case_condition(resource == 557 & points > 0)
```
Conditions can be reversed using `!` or the `reverse = TRUE` argument. The following to commands are equivalent.
```{r}
traffic_fines %>%
filter_case_condition(!(resource == 557 & points > 0))
traffic_fines %>%
filter_case_condition(resource == 557 & points > 0, reverse = TRUE)
```
## Endpoints
`filter_endpoints()` allows to select cases with a specific start and/or end activity. In case of the `patients` data set, all cases start with "Registration". Filtering cases that __don't__ start with Registration (`reverse = TRUE`) gives an empty log.
```{r}
patients %>%
filter_endpoints(start_activities = "Registration", reverse = TRUE)
```
If we are interested to see the "completed" cases, those that start with Registration and end we "Check-out", we can apply the following filter. Here [`process_map()`](frequency_maps.html) is used for the illustration purposes.
```{r fig.width = 9}
patients %>%
filter_endpoints(start_activities = "Registration", end_activities = "Check-out") %>%
process_map()
```
## Endpoints Condition
`filter_endpoints_condition()` allows to select cases by applying conditions to the start and/or end activity instance. For example. We can use it to replace the `filter_endpoints()` from above, using conditions on the _handling_ variable.
```{r}
patients %>%
filter_endpoints_condition(start_condition = handling == "Registration", end_condition = handling == "Check-out")
```
Naturally, both conditions can use any of the available variables. The following selects all cases that started between midnight and 6am. Note that no condition is applied on the end activity instance using the `end_condition = TRUE` specification. We use [`dotted_chart("relative_day")`](https://bupaverse.github.io/docs/dotted_chart.html) to plot a graph where, each activity instance is displayed with a dot. The x-axis refers to the time aspect (here a __relative__ time difference since the first case on x-axis), while the y-axis refers to cases.
```{r}
patients %>%
filter_endpoints_condition(start_condition = lubridate::hour(time) < 6, end_condition = TRUE) %>%
dotted_chart("relative_day")
```
## Flow Time
`filter_flow_time()` can be used to select cases in which a specific directly-follows flow (from > to) happens within a specific time duration interval.
For example, we can select the fines from `traffic_fines` in which the creation is followed by the payment within 4 weeks.
```{r}
traffic_fines %>%
filter_flow_time(from = "Create Fine", to = "Payment", interval = c(0,4), units = "weeks")
```
The `interval` can be defined as half-open using `NA` for the first or second element. Below select cases where payment is followed after 4 weeks.
```{r}
traffic_fines %>%
filter_flow_time(from = "Create Fine", to = "Payment", interval = c(4, NA), units = "weeks")
```
Note that we can also use `reverse = TRUE`. However, this will also include cases where _Create Fine_ is __not__ followed by _Payment_ at all. Therefore, the following filter is not equivalent to the previous one.
```{r}
traffic_fines %>%
filter_flow_time(from = "Create Fine", to = "Payment", interval = c(0, 4), units = "weeks", reverse = TRUE)
```
## Idle Time {.tabset .tabset-pills}
The idle time is the total time period during the execution of a case where no activity instances are _active_. An activity instance is considered _active_ between the registration of the first related event and the last related event. See more on performance metrics [here](performance_analysis.html).
`filter_idle_time()` can be used to select cases based on the amount of idle time. There are two approaches: using an interval, or using a percentage.
### Interval-based
<div class = "blue">
Using `filter_idle_time()` with argument `interval`, you can select cases of which the idle time falls within a certain duration of time. For example, all the cases of patients with an idle time from 10 to 20 hours. Note that it is mandatory to set the appropriate time unit using `units` for the interval to be as you intend it. The default time unit is seconds.
```{r}
patients %>%
filter_idle_time(interval = c(10,20), units = "hours") %>%
idle_time(unit = "hours")
```
Also here you can use half-open intervals.
```{r}
patients %>%
filter_idle_time(interval = c(10,NA), units = "hours") %>%
idle_time(unit = "hours")
```
And use `reverse = TRUE`.
```{r}
patients %>%
filter_idle_time(interval = c(NA,40), units = "hours", reverse = TRUE) %>%
idle_time(unit = "hours")
```
</div>
### Percentage-based
<div class = "blue">
Using `filter_idle_time()` with argument `percentage`, you can give priority to cases with the lowest idle time. For example, setting `percentage = 0.5` will select 50% of the cases, starting with those that have the lowest idle time.
```{r}
patients %>%
filter_idle_time(percentage = 0.5) %>%
idle_time(unit = "hours")
```
You can again set `reverse = TRUE` if you instead want 50% of the cases with the highest idle time.
```{r}
patients %>%
filter_idle_time(percentage = 0.5, reverse = TRUE) %>%
idle_time(unit = "hours")
```
Note that it is not necessary to specify the time units when using the percentage approach.
</div>
## {.unlisted .unnumbered}
Note that for both approaches, calculations using idle time assume non-atomic activity instances, i.e. activity instances that have more than one event. If each activity instance has only one registered event, the idle time will be equal to the throughput time. See more on performance metrics [here](performance_analysis.html). It is however possible that some activities instances have multiple events, while others have not. In those cases, idle time will take these active activity instances into account, and the resulting time will be less than the throughput time.
## Infrequent Flows
`filter_infrequent_flows()` allows us to select a set of cases in which every directly-follows flow has a minimum frequency. For example, consider the `traffic_fines` [process map](https://bupaverse.github.io/docs/frequency_maps.html) below.
```{r}
traffic_fines %>% process_map()
```
In this map, we can observe several unique directly follows relations, as well as flows occurring only 2 or 3 times. Using the filter, we can remove the cases that lead to these flows as follows:
```{r eval = T}
traffic_fines %>%
filter_infrequent_flows(min_n = 5) %>%
process_map()
```
We can immediately observe less very infrequent flows in the process map.
It is important to note that `filter_infrequent_flows()` does __not__ remove edges from the process map, but entire cases underlying infrequent behavior. We strongly adhere to the principal that the process map should be a based on a clearly defined set of events, which are either the result of case filters, or specific event filters (see [Event Filters](event_filters.html)). Removing specific edges from a process map requires removing specific activity instances from the log, which not necessarily removing other activity instances of the same activity type. This would result in an ambiguous map which could give a misleading view on your process.
## Precedence
The `filter_precedence()` allows us to filter cases based on flows between activities, using 5 different inputs:
* A list of (one or more) possible `antecedent` activities ("source"-activities)
* A list of (one or more) possible `consequent` activities ("target"-activities)
* A `precedence_type`
* "directly_follows"
* "eventually_follows"
* A `filter_method`: "all", "one_of" or "none" of the precedence rules should hold.
* A `reverse` argument
If there is more than one `antecedent` or `consequent` activity, the filter will test __all__ possible pairs. The `filter_method` will tell the filter whether all of the rules should hold, at least one, or none are allowed.
For example, take the `patients` data. The following filter takes only cases where _Triage and Assessment_ is directly followed by _Blood test_.
```{r}
patients %>%
filter_precedence(antecedents = "Triage and Assessment",
consequents = "Blood test",
precedence_type = "directly_follows") %>%
traces()
```
The following selects cases where _Triage and Assessment_ is eventually followed by __both__ _Blood test_ and _X-Ray_, which never happens.
```{r}
patients %>%
filter_precedence(antecedents = "Triage and Assessment",
consequents = c("Blood test", "X-Ray"),
precedence_type = "eventually_follows",
filter_method = "all") %>%
traces()
```
The next filter selects cases where _Triage and Assessement_ is eventually followed by __at least one of__ the three antecedents, by changing the filter method to _one_of_.
```{r}
patients %>%
filter_precedence(antecedents = "Triage and Assessment",
consequents = c("Blood test", "X-Ray", "MRI SCAN"),
precedence_type = "eventually_follows",
filter_method = "one_of") %>%
traces()
```
This final example only retains cases where _Triage and Assessment_ is _not_ followed by any of the three consequent activities. The result is 2 incomplete cases where the last activity was _Triage and Assessment_.
```{r}
patients %>%
filter_precedence(antecedents = "Triage and Assessment",
consequents = c("Blood test", "X-Ray", "MRI SCAN"),
precedence_type = "eventually_follows",
filter_method = "none") %>%
traces()
```
As always, the filter can be negated with `reverse = TRUE`.
## Precedence Condition
`filter_precedence_condition()` is a generic version of `filter_precendence()`, where the antecedent(s) and consequent(s) are conditions instead of activity labels. This filter can only test for one pair at a time, thus not having a `filter_method`. The `precedence_type` can again be configured.
The following examples takes all cases from `traffic_fines` where an activity instance with _dismissal_ equal to _NIL_ is eventually followed by an activity instance with _notificationtype_ equal to _P_.
```{r}
traffic_fines %>%
filter_precedence_condition(antecedent_condition = dismissal == "NIL",
consequent_condition = notificationtype == "P",
precedence_type = "eventually_follows")
```
## Precedence Resource
`filter_precedence_resource()` is similar to `filter_precedence()`, but additionally requires that the resources of both executions are equal. While there are three traces that adhere to the following antecedence-consequent directly-follows pair (see earlier), there is not a single case where the two activities are executed by the same resource, returning an empty log. (In fact, all activity types in patients are linked to a distinct resource in a one-to-one relationship.)
```{r}
patients %>%
filter_precedence_resource(antecedents = "Triage and Assessment",
consequents = "Blood test",
precedence_type = "directly_follows") %>%
traces()
```
## Processing Time {.tabset .tabset-pills}
The processing time is the total time period during the execution of a case where an activity instance is _active_. An activity instance is considered _active_ between the registration of the first related event and the last related event. See more on performance metrics [here](performance_analysis.html).
`filter_processing_time()` can be used to select cases based on the amount of processing time. There are two approaches: using an interval, or using a percentage.
### Interval-based
<div class = "blue">
Using `filter_processing_time()` with argument `interval`, you can select cases of which the processing time falls within a certain duration of time. For example, all the cases of patients with an processing time from 10 to 20 hours. Note that it is mandatory to set the appropriate time unit using `units` for the interval to be as you intend it. The default time unit is seconds.
```{r}
patients %>%
filter_processing_time(interval = c(10,20), units = "hours") %>%
processing_time(unit = "hours")
```
Also here you can use half-open intervals.
```{r}
patients %>%
filter_processing_time(interval = c(10,NA), units = "hours") %>%
processing_time(unit = "hours")
```
And use `reverse = TRUE`.
```{r}
patients %>%
filter_processing_time(interval = c(NA,20), units = "hours", reverse = TRUE) %>%
processing_time(unit = "hours")
```
</div>
### Percentage-based
<div class = "blue">
Using `filter_processing_time()` with argument `percentage`, you can give priority to cases with the lowest processing time. For example, setting `percentage = 0.5` will select 50% of the cases, starting with those that have the lowest processing time.
```{r}
patients %>%
filter_processing_time(percentage = 0.5) %>%
processing_time(unit = "hours")
```
You can again set `reverse = TRUE` if you instead want 50% of the cases with the highest processing time.
```{r}
patients %>%
filter_processing_time(percentage = 0.5, reverse = TRUE) %>%
processing_time(unit = "hours")
```
Note that it is not necessary to specify the time units when using the percentage approach.
</div>
## {.unlisted .unnumbered}
Note that for both approaches, calculations using processing time assume non-atomic activity instances, i.e. activity instances that have more than one event. If each activity instance has only one registered event, the processing time will be zero. See more on performance metrics [here](performance_analysis.html). It is however possible that some activities instances have multiple events, while others have not. In those cases, processing time will take only these active activity instances into account, and the resulting time will be more than zero.
## Throughput Time {.tabset .tabset-pills}
The throughput time is the total time period from the first event to the last event belonging to a case. See more on performance metrics [here](performance_analysis.html).
`filter_throughput_time()` can be used to select cases based on the amount of throughput time. There are two approaches: using an interval, or using a percentage.
### Interval-based
<div class = "blue">
Using `filter_throughput_time()` with argument `interval`, you can select cases of which the throughput time falls within a certain duration of time. For example, all the cases of patients with an throughput time from 1 to 5 days. Note that it is mandatory to set the appropriate time unit using `units` for the interval to be as you intend it. The default time unit is seconds.
```{r}
patients %>%
filter_throughput_time(interval = c(1,5), units = "days") %>%
throughput_time(unit = "days")
```
Also here you can use half-open intervals.
```{r}
patients %>%
filter_throughput_time(interval = c(10,NA), units = "days") %>%
throughput_time(unit = "days")
```
And use `reverse = TRUE`.
```{r}
patients %>%
filter_throughput_time(interval = c(10,NA), units = "days", reverse = TRUE) %>%
throughput_time(unit = "days")
```
</div>
### Percentage-based
<div class = "blue">
Using `filter_throughput_time()` with argument `percentage`, you can give priority to cases with the lowest throughput time. For example, setting `percentage = 0.5` will select 50% of the cases, starting with those that have the lowest throughput time.
```{r}
patients %>%
filter_throughput_time(percentage = 0.5) %>%
throughput_time(unit = "days")
```
You can again set `reverse = TRUE` if you instead want 50% of the cases with the highest throughput time.
```{r}
patients %>%
filter_throughput_time(percentage = 0.5, reverse = TRUE) %>%
throughput_time(unit = "days")
```
Note that it is not necessary to specify the time units when using the percentage approach.
</div>
## {.unlisted .unnumbered}
## Time Period {.tabset .tabset-pills}
Filtering cases by time period can be done using the `filter_time_period()` introduced above. There are four different `filter_method`'s that act as case filters:
* "start": all cases started in an interval.
* "complete": all cases completed in an interval.
* "contained": all cases contained in an interval.
* "intersecting": all cases with some activity in an interval.
Using the same interval (the month of January 2015), you can compare the results of different filtering methods below using [dotted charts](dotted_chart.html).
### Start
<div class = "blue">
```{r out.width = "100%", fig.asp = 0.6, fig.width = 8}
sepsis %>%
filter_time_period(interval = ymd(c(20150101, 20150131)), filter_method = "start") %>%
dotted_chart()
```
</div>
### Complete
<div class = "blue">
```{r out.width = "100%", fig.asp = 0.6, fig.width = 8}
sepsis %>%
filter_time_period(interval = ymd(c(20150101, 20150131)), filter_method = "complete") %>%
dotted_chart()
```
</div>
### Contained
<div class = "blue">
```{r out.width = "100%", fig.asp = 0.6, fig.width = 8}
sepsis %>%
filter_time_period(interval = ymd(c(20150101, 20150131)), filter_method = "contained") %>%
dotted_chart()
```
</div>
### Intersecting
<div class = "blue">
```{r out.width = "100%", fig.asp = 0.6, fig.width = 8}
sepsis %>%
filter_time_period(interval = ymd(c(20150101, 20150131)), filter_method = "intersecting") %>%
dotted_chart()
```
</div>
## {.unlisted .unnumbered}
## Trace Frequency {.tabset .tabset-pills}
The frequency of a trace, i.e. distinct activity sequence, is the number of cases, i.e. process instances that follow this trace.
`filter_trace_frequency()` can be used to select cases based on the amount of throughput time. There are two approaches: using an interval, or using a percentage.
### Interval-based
<div class = "blue">
Using `filter_trace_frequency()` with argument `interval`, you can select cases of which the trace frequency falls within a certain frequency interval. For example, all the cases from `sepsis` with a trace frequency between 10 and 50. [`traces()`](inspect_logs.html) is used to show the changes to the log data after applying the filter.
```{r}
sepsis %>%
filter_trace_frequency(interval = c(10,50)) %>%
traces()
```
Also here you can use half-open intervals.
```{r}
sepsis %>%
filter_trace_frequency(interval = c(5,NA)) %>%
traces()
```
And use `reverse = TRUE`.
```{r}
sepsis %>%
filter_trace_frequency(interval = c(5,NA), reverse = TRUE) %>%
traces()
```
</div>
### Percentage-based
<div class = "blue">
Using `filter_trace_frequency()` with argument `percentage`, you can give priority to cases with a frequent trace. For example, setting `percentage = 0.2` will select at least 20% of the cases, starting with those that have the highest frequency.
```{r}
sepsis %>%
filter_trace_frequency(percentage = 0.8) %>%
traces()
```
You can again set `reverse = TRUE` if you instead want 80% of the cases with the lowest frequency.
```{r}
sepsis %>%
filter_trace_frequency(percentage = 0.2, reverse = TRUE) %>%
traces()
```
Note that the obtained percentage of cases will not always be exactly the specified percentage, as there can be ties. For example, in the `sepsis` data set, 784 of the 1050 cases (75%) follow a distinct activity sequence. As `bupaR` will not break ties randomly, it will select _all_ cases once the percentage set is higher then ca. 24%, as it will include all unique cases then still remaining in the log to get to this coverage.
</div>
## {.unlisted .unnumbered}
## Trace Length {.tabset .tabset-pills}
The length of a trace, i.e. distinct activity sequence, is the number of activity instances it contains. Note that this is not necessarily equal to the number of events.
`filter_trace_length()` can be used to select cases based on the amount of throughput time. There are two approaches: using an `interval`, or using a `percentage`.
### Interval-based
<div class = "blue">
Using `filter_trace_length()` with argument `interval`, you can select cases of which the trace length falls within a certain interval. For example, all the cases of sepsis with a trace length between 10 and 50. Changes are illustrated with [`traces()`](inspect_logs.html).
```{r}
sepsis %>%
filter_trace_length(interval = c(10,50)) %>%
traces()
```
Also here you can use half-open intervals.
```{r}
sepsis %>%
filter_trace_length(interval = c(10,NA)) %>%
traces()
```
And use `reverse = TRUE`.
```{r}
sepsis %>%
filter_trace_length(interval = c(10,NA), reverse = TRUE) %>%
traces()
```
</div>
### Percentage-based
<div class = "blue">
Using `filter_trace_length()` with argument `percentage`, you can give priority to cases with the longest length. For example, setting `percentage = 0.5` will select 50% of the cases, starting with those that have the highest length. Again, changes are illustrated with [`traces()`](inspect_logs.html).
```{r}
sepsis %>%
filter_trace_length(percentage = 0.5) %>%
traces()
```
You can again set `reverse = TRUE` if you instead want 50% of the cases with the lowest frequency.
```{r}
sepsis %>%
filter_trace_length(percentage = 0.5, reverse = TRUE) %>%
traces()
```
Note that the obtained percentage of cases will not always be exactly the specified percentage, as there can be ties.
</div>
## {.unlisted .unnumbered}
```{r footer, results = "asis", echo = F}
CURRENT_PAGE <- stringr::str_replace(knitr::current_input(), ".Rmd",".html")
res <- knitr::knit_expand("_button_footer.Rmd", quiet = TRUE)
res <- knitr::knit_child(text = unlist(res), quiet = TRUE)
cat(res, sep = '\n')
```