-
Notifications
You must be signed in to change notification settings - Fork 11
/
xes_files.Rmd
131 lines (91 loc) · 5.63 KB
/
xes_files.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
title: "bupaR Docs | Xes Files"
---
```{r echo = F, out.width="25%", fig.align = "right"}
knitr::include_graphics("images/icons/create.PNG")
```
***
# XES Files
## Read XES
`read_xes()` can be used to read XES files and turn the data into an `eventlog` object in R. The function needs only one `xesfile` argument. This can be a local path to a file with a .xes extension or an URL. An example XES file can be found at the following link: [https://bupar.net/eventdata/exercise1.xes](https://bupar.net/eventdata/exercise1.xes). When opening this file in a browser, you will see that it is an XML file. More information on the notation can be found [here](http://www.xes-standard.org/).
Importing a XES file is easily done as follows:
```{r include = F}
library(bupaR)
library(pander)
```
```{r}
library(xesreadR)
```
```{r}
data <- read_xes("https://bupar.net/eventdata/exercise1.xes")
data
```
Note that in the example above, `read_xes()` emits a warnings that no activity instance identifier was found. Recall that an `eventlog` object in R needs certain [data fields](create_logs.html) to be present. However, it might be so that not all of these field are available, in which case `read_xes()` will throw a warning or an error. Ideally, the XES file should contain at least the following elements:
```xml
<trace>
<string key="concept:name" value="Case3.0"/>
<event>
<string key="concept:name" value="A"/>
<int key="concept:instance" value = "1"/>
<string key="org:resource" value="UNDEFINED"/>
<date key="time:timestamp" value="2008-12-09T08:20:01.527+01:00"/>
<string key="lifecycle:transition" value="complete"/>
...
</event>
...
</trace>
```
These elements are translated according to the terminology used in `bupaR` as follows:
```{r results='asis', echo = F}
table <- "
XES | | bupaR
trace | concept:name | case_id
event | concept:name |activity_id
| concept:instance | activity_instance_id
| org:resource | resource_id
| time:timestamp | timestamp
| lifecycle:transition | lifecycle_id
"
df <- read.delim(textConnection(table),header=FALSE,sep="|",strip.white=F,stringsAsFactors=FALSE)
names(df) <- unname(as.list(df[1,])) # put headers on
df <- df[-1,] # remove first row
row.names(df)<-NULL
pandoc.table(df, style = 'rmarkdown', justify = "left", split.table = Inf)
```
When there is no case identifier, an artificial case identifier __CASE_ID__ will be created based on the hierarchy of the XES file. In case of other missing elements, either an error or a warning will be thrown.
### Errors
An error will be thrown if a XES file does not contain an __activity identifier__ or a __timestamp__. As such these are the minimum requirements to create an `eventlog` object from a XES file.
### Warnings
* In case the __lifecycle transition identifier__ or the __resource identifier__ is missing, an empty placeholder variable will be created and a warning will be emitted.
* In case the __activity instance identifier__ is missing, a default activity instance identifier column will be added. This column will regard every event in the log as a distinct activity instance. A warning will be emitted noting that you should check whether this is a justified assumption.
* If available, missing information can be added manually to the `eventlog` object by overwriting the variables, e.g. with `mutate()`.
### Additional elements
Note that both traces and events can have additional elements in the XES files. These will be added as extra variables in the resulting log. Attributes at a the level of traces will get the prefix *CASE_* in their name. [^1]
[^1]: On terminology: what in XES is called a _trace_ (i.e. between <trace> tags) is called a case or process instance in bupaR. In `bupaR` the concept _trace_ is reserved for an activity sequence, and is not related to a specific process instance. Many process instances can share the same _trace_ of activities. The terminology used in `bupaR` is in correspondence with current literature. See [creating logs](create_logs.html) for more information about the data model used.
## Create list of cases
In certain circumstances, it might be useful to have a separate list of cases with case attributes. This can be obtained using `read_xes_cases()`. The argument for this function remains the same `xesfile.` The result is a `data.frame` with one row for each case and one column for each attribute. Non-existing attributes for a specific case are filled in with NA`s. Below, this function is illustrated using the _repairExample_ event log, which has one case attribute called _description_. For the sake of illustration the entire event log is also imported.
```{r}
read_xes_cases("https://bupar.net/eventdata/repairExample.xes")
```
```{r}
read_xes("https://bupar.net/eventdata/repairExample.xes")
```
## Write XES
Use `write_xes()` to write a XES file.
```{r}
args(write_xes)
```
It minimally requires 2 arguments:
* an `eventlog` object
* a file name/path where to store the file (if not specified, as file system window will open to save the file)
Additionally, one can specify which of the variables in the event log should be regarded as case attributes by supplying a character vector of variable names to the `case_attributes` argument. If this argument is not specified, all the variables starting with the prefix **CASE_** will be considered as case attributes.
```{r eval = F}
eventdataR::patients
write_xes(patients, "patients.xes")
```
```{r footer, results = "asis", echo = F}
CURRENT_PAGE <- stringr::str_replace(knitr::current_input(), ".Rmd",".html")
res <- knitr::knit_expand("_button_footer.Rmd", quiet = TRUE)
res <- knitr::knit_child(text = unlist(res), quiet = TRUE)
cat(res, sep = '\n')
```