You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 13, 2021. It is now read-only.
Earthquakes permutes C/Java function calls, Ethernet packets, Filesystem events, and injected faults in various orders so as to find implementation-level bugs of the distributed system.
14
+
Earthquakes permutes Java function calls, Ethernet packets, Filesystem events, and injected faults in various orders so as to find implementation-level bugs of the distributed system.
14
15
Earthquake can also control non-determinism of the thread interleaving (by calling `sched_setattr(2)` with randomized parameters).
15
16
So Earthquake can be also used for testing standalone multi-threaded software.
16
17
@@ -27,13 +28,13 @@ Basically, Earthquake permutes events in a random order, but you can write your
27
28
* Found [YARN-4301](https://issues.apache.org/jira/browse/YARN-4301) (fault tolerance): ([repro code](example/yarn/4301-reproduce))
$ go get github.com/osrg/earthquake/earthquake-container
36
-
$ sudo earthquake-container run -it --rm ubuntu bash
37
+
$ sudo earthquake-container run -it --rm -v /foo:/foo ubuntu bash
37
38
38
39
39
40
In *Earthquake Container*, you can run arbitrary command that might be *flaky*.
@@ -59,6 +60,11 @@ explorePolicy = "random"
59
60
# Default: 0 and 0
60
61
minInterval = "80ms"
61
62
maxInterval = "3000ms"
63
+
64
+
# for Ethernet/Filesystem inspectors, you can specify fault-injection probability (0.0-1.0).
65
+
# Default: 0.0
66
+
faultActionProbability = 0.0
67
+
62
68
# for Process inspector, you can specify how to schedule processes
63
69
# "mild": execute processes with randomly prioritized SCHED_NORMAL/SCHED_BATCH scheduler.
64
70
# "extreme": pick up some processes and execute them with SCHED_RR scheduler. others are executed with SCHED_BATCH scheduler.
@@ -76,31 +82,125 @@ explorePolicy = "random"
76
82
```
77
83
For other parameters, please refer to [`config.go`](earthquake/util/config/config.go) and [`randompolicy.go`](earthquake/explorepolicy/random/randompolicy.go).
78
84
79
-
If you don't want to use containers, you can also use Earthquake (process inspector) with an arbitrary process tree.
80
85
86
+
## Quick Start (Non-container mode)
87
+
If you don't want to use containers, please use the `earthquake` command directly.
By default, all the processes and the threads under `$TARGET_PID` are randomly scheduled.
97
+
98
+
You can also specify a config file by running with `-autopilot config.toml`.
99
+
100
+
You can also set `-orchestrator-url` and `-entity-id` for distributed execution.
101
+
102
+
Note that the process inspector may be not effective for reproducing short-running flaky tests, but it's still effective for long-running tests: [issue #125](https://github.com/osrg/earthquake/issues/125).
103
+
104
+
105
+
The guide for reproducing flaky Hadoop tests (please use `earthquake` instead of `microearthquake`): [FOSDEM slide 42](http://www.slideshare.net/AkihiroSuda/tackling-nondeterminism-in-hadoop-testing-and-debugging-distributed-systems-with-earthquake-57866497/42).
[The slides for the presentation at FOSDEM](http://www.slideshare.net/AkihiroSuda/tackling-nondeterminism-in-hadoop-testing-and-debugging-distributed-systems-with-earthquake-57866497/42) might be also helpful.
139
+
Please also refer to [doc/how-to-setup-env-full.md](doc/how-to-setup-env-full.md) for this feature.
140
+
141
+
### Java inspector (AspectJ, byteman)
142
+
143
+
To be documented
144
+
145
+
### Distributed execution
146
+
147
+
Basically please follow these examples: [example/zk-found-2212.ryu](example/zk-found-2212.ryu), [example/zk-found-2212.nfqhook](example/zk-found-2212.nfqhook)
148
+
149
+
#### Step 1
150
+
Prepare `config.toml` for distributed execution.
151
+
Example:
152
+
```toml
153
+
# executed in `earthquake init`
154
+
init = "init.sh"
155
+
156
+
# executed in `earthquake run`
157
+
run = "run.sh"
158
+
159
+
# executed in `earthquake run` as the test oracle
160
+
validate = "validate.sh"
161
+
162
+
# executed in `earthquake run` as the clean-up script
163
+
clean = "clean.sh"
164
+
165
+
# REST port for the communication.
166
+
# You can also set pbPort for ProtocolBuffers (Java inspector)
167
+
restPort = 10080
168
+
169
+
# of course you can also set explorePolicy here as well
170
+
```
171
+
172
+
#### Step 2
173
+
Create `materials` directory, and put `*.sh` into it.
174
+
175
+
#### Step 3
176
+
Run `earthquake init --force config.toml materials /tmp/x`.
177
+
178
+
This command executes `init.sh` for initializing the workspace `/tmp/x`.
179
+
`init.sh` can access the `materials` directory as `${EQ_MATERIALS_DIR}`.
180
+
181
+
#### Step 4
182
+
Run `for f in $(seq 1 100);do earthquake run /tmp/x; done`.
183
+
184
+
This command starts the orchestrator, and executes `run.sh`, `validate.sh`, and `clean.sh` for testing the system (100 times).
`*.sh` can access the `/tmp/x/{00000000, 00000001, 00000002, ..., 00000063}` directory as `${EQ_WORKING_DIR}`, which is intended for putting test results and some relevant information. (Note: 0x63==99)
189
+
190
+
`validate.sh` should exit with zero for successful executions, and with non-zero status for failed executions.
191
+
192
+
`clean.sh` is an optional clean-up script for each of the execution.
193
+
194
+
#### Step 5
195
+
Run `earthquake summary /tmp/x` for summarizing the result.
196
+
197
+
If you have [JaCoCo](http://eclemma.org/jacoco/) coverage data, you can run `java -jar bin/earthquake-analyzer.jar --classes-path /somewhere/classes /tmp/x` for counting execution patterns as in [FOSDEM slide 18](http://www.slideshare.net/AkihiroSuda/tackling-nondeterminism-in-hadoop-testing-and-debugging-distributed-systems-with-earthquake-57866497/18).
0 commit comments