-
Notifications
You must be signed in to change notification settings - Fork 90
/
ch28.html
341 lines (238 loc) · 25.1 KB
/
ch28.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
<!doctype html>
<html lang="en">
<head>
<meta http-equiv="x-ua-compatible" content="ie=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="Study guide for the Oracle Certified Professional, Java SE 8 Programmer Exam ">
<title>Java 8 Programmer II Study Guide: Exam 1Z0-809</title>
<link href="css/code.css" rel="stylesheet" type="text/css" />
<link href="css/style.css" rel="stylesheet" type="text/css" />
<link href="https://netdna.bootstrapcdn.com/font-awesome/3.2.1/css/font-awesome.css" rel="stylesheet">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.2/jquery.min.js"></script>
<script src="js/common-sections.js"></script>
</head>
<body>
<div class="nav"></div>
<div class="header">
<div class="title-container">
<div class="chapter-title">
<h1><i class="chapter">Chapter TWENTY-EIGHT</i><br />
Fork/Join Framework</h1>
<p><br /></p>
<h3 style="text-align: center;"><i>Exam Objectives</i></h3>
<p style="text-align: center;"><i>Use parallel Fork/Join Framework.</i><br /></p>
</div>
</div>
</div>
<div class="container">
<div class="column">
<h2>The Fork/Join Framework</h2>
<p>The Fork/Join framework is designed to work with large tasks that can be split up into smaller tasks.</p>
<p>This is done through recursion, where you keep splitting up the task until you meet the base case, a task so simple that can be solved directly, and then combining all the partial results to compute the final result.</p>
<p>Splitting up the problem is know as <b>FORKING</b> and combining the results is known as <b>JOINING</b>.</p>
<p>The main class of the Fork/Join framework is <code>java.util.concurrent.ForkJoinPool</code>, which is actually a subclass of <code>ExecutorService</code>.</p>
<p>We create a <code>ForkJoinPool</code> instance mostly through two constructors:</p>
<p><code class="java hljs">ForkJoinPool()<br/>ForkJoinPool(<span class="hljs-keyword">int</span> parallelism)</code></p>
<p>The first version is the recommended way. It creates an instance with a number of threads equal to the number of processors of your machine (using the method <code>Runtime.getRuntime().availableProcessors()</code>).</p>
<p>The other version allows you to define the number of threads that will be used.</p>
<p>Just like an <code>ExecutorService</code> executes a task represented by a <code>Runnable</code> or a <code>Callable</code>, in the Fork/Join framework a task is usually represented by a subclass of either:</p>
<ul>
<li><code>RecursiveAction</code>, which is the equivalent of <code>Runnable</code> in the sense that it <b>DOESN'T</b> return a value.</li>
<li><code>RecursiveTask<V></code>, which is the equivalent of <code>Callable</code> in the sense that it <b>DOES</b> return a value.</li>
</ul>
<p>Both of them extend from the abstract class <code>java.util.concurrent.ForkJoinTask</code>.</p>
<p>However, unlike the worker threads that an <code>ExecutorService</code> uses, the threads of a <code>ForkJoinPool</code> use a work-stealing algorithm, which means that when a thread is free, it <b>STEALS</b> the pending work of other threads that are still busy doing other work.</p>
<p>To implement this, three methods of your <code>ForkJoinTask</code>-based class are important to the framework:</p>
<p><code class="java hljs"><span class="hljs-function">ForkJoinTask<V> <span class="hljs-title">fork</span><span class="hljs-params">()</span><br />
V <span class="hljs-title">join</span><span class="hljs-params">()</span><br />
<span class="hljs-comment">// if you extend RecursiveAction<br /></span><span class="hljs-keyword">protected</span> <span class="hljs-keyword">abstract</span> <span class="hljs-keyword">void</span> <span class="hljs-title">compute</span><span class="hljs-params">()</span><br />
<span class="hljs-comment">// if you extend RecursiveTask<br /></span><span class="hljs-keyword">protected</span> <span class="hljs-keyword">abstract</span> V <span class="hljs-title">compute</span><span class="hljs-params">()</span></span></code></p>
<p>And each thread in the <code>ForkJoinPool</code> has a queue of these tasks.</p>
<p>In the beginning, you have a large task. This task is divided into (generally) two smaller tasks recursively until the base case is reached.</p>
<p>Each time a task is divided, you call the <code>fork()</code> method to place the first subtask in the current thread's queue, and then you call the <code>compute()</code> method on the second subtask (to recursively calculate the result).</p>
<p>This way, the first subtask will wait in the queue to be processed or stolen by an idle thread to repeat the process. The second subtask will be processed immediately (also repeating the process).</p>
<p>Of course, you have to divide the task enough times to keep all threads busy (preferably into a number of tasks greater that the number of threads to ensure this).</p>
<p>All right, let's review this. The first subtask is waiting in a queue to be processed, and the second one is processed immediately. So when or how do you get the result of the first subtask?</p>
<p>To get the result of the first subtask you call the <code>join()</code> method on this first subtask.</p>
<p>This should be the last step because <code>join()</code> will block the program until the result is returned.</p>
<p>This means that the <b>ORDER</b> in which you call the methods is <b>IMPORTANT</b>.</p>
<p>If you don't call <code>fork()</code> before <code>join()</code>, there won't be any result to get.</p>
<p>If you call <code>join()</code> before <code>compute()</code>, the program will perform like if it was executed in one thread and you'll be wasting time, because while the second subtask is recursively calculating the value, the first one can be stolen by another thread to process it. This way, when <code>join()</code> is finally called, either the result is ready or you don't have to wait a long time to get it.</p>
<p>But remember, the Fork/Join framework is not for every task. You can use it for any task that can be solved (or algorithm that can be implemented) recursively, but it's best for tasks that can be divided into smaller subtasks <b>AND</b> that they can be computed independently (so the order doesn't matter).</p>
<p>So let's pick a simple example, finding the minimum value of an array. This array can be split up in many subarrays and locate the minimum of each of them. Then, we can find the minimum value between those values.</p>
<p>Let's code this example with a <code>RecursiveAction</code> first, to see how this fork/join works. Remember that this class doesn't return a result so we're only going to print the partial results.</p>
<p>Another thing. The most basic scenario we can have (the base case) is when we just have to compare two values. However, having subtasks that are too small won't perform well.</p>
<p>For that reason, when working with fork/join, generally you divide the elements in sets of a certain size (that can be handled by a single thread), for which you solve the problem sequentially.</p>
<p>For this example, let's process five numbers per thread:</p>
<p><code class="java hljs"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">FindMinimumAction</span> <span class="hljs-keyword">extends</span> <span class="hljs-title">RecursiveAction</span></span> {<br />
<span class="hljs-comment"> // A thread can easily handle, let's say, five elements</span><br />
<span class="hljs-keyword"> private</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">final</span> <span class="hljs-keyword">int</span> SEQUENTIAL_THRESHOLD = <span class="hljs-number">5</span>;<br />
<span class="hljs-comment"> // The array with the numbers (we'll pass the same array in</span><br />
<span class="hljs-comment"> // every recursive call to avoid creating a lot of arrays)</span><br />
<span class="hljs-keyword"> private</span> <span class="hljs-keyword">int</span>[] data;<br />
<span class="hljs-comment"> // The index that tells use where a (sub)task starts</span><br />
<span class="hljs-keyword"> private</span> <span class="hljs-keyword">int</span> start;<br />
<span class="hljs-comment"> // The index that tells use where a (sub)task ends</span><br />
<span class="hljs-keyword"> private</span> <span class="hljs-keyword">int</span> end;<br />
<span class="hljs-comment"><br />
// Since compute() doesn't take parameters, you have to</span><br />
<span class="hljs-comment"> // pass in the task's constructor the data to work</span><br />
<span class="hljs-function"><span class="hljs-keyword"> public</span> <span class="hljs-title">FindMinimumAction</span><span class="hljs-params">(<span class="hljs-keyword">int</span>[] data, <span class="hljs-keyword">int</span> start, <span class="hljs-keyword">int</span> end)</span></span> {<br />
<span class="hljs-keyword"> this</span>.data = data;<br />
<span class="hljs-keyword"> this</span>.start = start;<br />
<span class="hljs-keyword"> this</span>.end = end;<br />
}<br />
<span class="hljs-annotation"> @Override</span><br />
<span class="hljs-function"><span class="hljs-keyword"> protected</span> <span class="hljs-keyword">void</span> <span class="hljs-title">compute</span><span class="hljs-params">()</span></span> {<br />
<span class="hljs-keyword"> int</span> length = end - start;<br />
<span class="hljs-keyword"> if</span> (length <= SEQUENTIAL_THRESHOLD) { <span class="hljs-comment">// base case</span><br />
<span class="hljs-keyword"> int</span> min = computeMinimumDirectly();<br />
System.out.println(<span class="hljs-string">"Minimum of this subarray: "</span>+ min);<br />
} <span class="hljs-keyword">else</span> { <span class="hljs-comment">// recursive case</span><br />
<span class="hljs-comment"> // Calcuate new subtasks range</span><br />
<span class="hljs-keyword"> int</span> mid = start + length / <span class="hljs-number">2</span>;<br />
FindMinimumAction firstSubtask =<br />
<span class="hljs-keyword"> new</span> FindMinimumAction(data, start, mid);<br />
FindMinimumAction secondSubtask =<br />
<span class="hljs-keyword"> new</span> FindMinimumAction(data, mid, end);<br />
firstSubtask.fork(); <span class="hljs-comment">// queue the first task</span><br />
secondSubtask.compute(); <span class="hljs-comment">// compute the second task</span><br />
firstSubtask.join(); <span class="hljs-comment">// wait for the first task result</span><br />
}<br />
}<br />
<span class="hljs-comment">/** Method that find the minimum value */</span><br />
<span class="hljs-function"><span class="hljs-keyword"> private</span> <span class="hljs-keyword">int</span> <span class="hljs-title">computeMinimumDirectly</span><span class="hljs-params">()</span></span> {<br />
<span class="hljs-keyword"> int</span> min = Integer.MAX_VALUE;<br />
<span class="hljs-keyword"> for</span> (<span class="hljs-keyword">int</span> i = start; i < end; i++) {<br />
<span class="hljs-keyword"> if</span> (data[i] < min) {<br />
min = data[i];<br />
}<br />
}<br />
<span class="hljs-keyword"> return</span> min;<br />
}<br />
}</code></p>
<p>The <code>compute()</code> method defines the base case, when the (sub)array has five elements or less, in which case the minimum is found sequentially. Otherwise, the array is split into two subarrays recursively until the condition of the base case is fulfilled.</p>
<p>Dividing the tasks may not always result in evenly distributed subtasks. But to keep this simple, let's try the class with twenty elements, which is very likely to be split up into four sets:</p>
<p><code class="java hljs"><span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">void</span> <span class="hljs-title">main</span><span class="hljs-params">(String[] args)</span></span> {<br />
<span class="hljs-keyword"> int</span>[] data = <span class="hljs-keyword">new</span> <span class="hljs-keyword">int</span>[<span class="hljs-number">20</span>];<br />
Random random = <span class="hljs-keyword">new</span> Random();<br />
<span class="hljs-keyword"> for</span> (<span class="hljs-keyword">int</span> i = <span class="hljs-number">0</span>; i < data.length; i++) {<br />
data[i] = random.nextInt(<span class="hljs-number">1000</span>);<br />
System.out.print(data[i] + <span class="hljs-string">" "</span>);<br />
<span class="hljs-comment"> // Let's print each subarray in a line</span><br />
<span class="hljs-keyword"> if</span>( (i+<span class="hljs-number">1</span>) % SEQUENTIAL_THRESHOLD == <span class="hljs-number">0</span> ) {<br />
System.out.println();<br />
}<br />
}<br />
ForkJoinPool pool = <span class="hljs-keyword">new</span> ForkJoinPool();<br />
FindMinimumAction task =<br />
<span class="hljs-keyword"> new</span> FindMinimumAction(data, <span class="hljs-number">0</span>, data.length);<br />
pool.invoke(task);<br />
}</code></p>
<p>A possible output can be:</p>
<p><code class="hljs">109 411 348 938 776<br />
188 42 28 818 825<br />
642 454 431 742 463<br />
33 832 705 910 456<br />
Minimum of this subarray: 33<br />
Minimum of this subarray: 28<br />
Minimum of this subarray: 431<br />
Minimum of this subarray: 109</code></p>
<p>Notice that we didn't need to shut down the pool explicitly. When the program exits, the pool is shut down, so it can be reused.</p>
<p>We also have the <code>invokeAll()</code> method, that doesn't return a value but does something equivalent to the call of <code>fork()</code>, <code>compute()</code>, and <code>join()</code> methods. So instead of having something like:</p>
<p><code class="java hljs">...<br />
FindMinimumAction firstSubtask =<br />
<span class="hljs-keyword"> new</span> FindMinimumAction(data, start, mid);<br />
FindMinimumAction secondSubtask =<br />
<span class="hljs-keyword"> new</span> FindMinimumAction(data, mid, end);<br />
<span class="hljs-comment">// queue the first task<br /></span>firstSubtask.fork();<br />
<span class="hljs-comment">// compute the second task<br /></span>secondSubtask.compute();<br />
<span class="hljs-comment">// wait for the first task result<br /></span>firstSubtask.join();<br />
...</code></p>
<p>We can simply have:</p>
<p><code class="java hljs">...<br />
FindMinimumAction firstSubtask =<br />
<span class="hljs-keyword"> new</span> FindMinimumAction(data, start, mid);<br />
FindMinimumAction secondSubtask =<br />
<span class="hljs-keyword"> new</span> FindMinimumAction(data, mid, end);<br />
invokeAll(firstSubtask, secondSubtask);<br />
...</code></p>
<p>Now, let's change this example to use a <code>RecursiveTask</code> so we can return the minimum value of all.</p>
<p>Actually, the only changes we need to do are in the <code>compute()</code> method:</p>
<p><code class="java hljs"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">FindMinimumTask</span> <span class="hljs-keyword">extends</span> <span class="hljs-title">RecursiveTask</span><<span class="hljs-title">Integer</span>></span> {<br />
<span class="hljs-comment"> // ...</span><br />
<span class="hljs-annotation"> @Override</span><br />
<span class="hljs-function"><span class="hljs-keyword"> <span class="hljs-comment">//Return type matches the generic<br /></span> protected</span> Integer <span class="hljs-title">compute</span><span class="hljs-params">()</span></span> {<br />
<span class="hljs-keyword"> int</span> length = end - start;<br />
<span class="hljs-keyword"> if</span> (length <= SEQUENTIAL_THRESHOLD) { <span class="hljs-comment">// base case</span><br />
<span class="hljs-keyword"> return</span> computeMinimumDirectly();<br />
} <span class="hljs-keyword">else</span> { <span class="hljs-comment">// recursive case</span><br />
<span class="hljs-comment"> // Calcuate new subtasks range</span><br />
<span class="hljs-keyword"> int</span> mid = start + length / <span class="hljs-number">2</span>;<br />
FindMinimumTask firstSubtask =<br />
<span class="hljs-keyword"> new</span> FindMinimumTask(data, start, mid);<br />
FindMinimumTask secondSubtask =<br />
<span class="hljs-keyword"> new</span> FindMinimumTask(data, mid, end);<br />
<span class="hljs-comment">// queue the first task<br /></span> firstSubtask.fork();<br />
<span class="hljs-comment"> // Return the miminum of all subtasks</span><br />
<span class="hljs-keyword"> return</span> Math.min(secondSubtask.compute(),<br />
firstSubtask.join());<br />
}<br />
}<br />
<span class="hljs-comment"> // ...</span><br />
}</code></p>
<p>In the <code>main()</code> method, the only changes are printing the value that <code>pool.invoke(task)</code> returns and a little formatting to the output, which can be:</p>
<p>Array values:</p>
<p><code class="hljs">819 997 124 425 669 657 487 447 386 979 31 748 194 644 893 209 913 810 142 565<br />
Minimum value: 31</code></p>
<h2>Key Points</h2>
<ul>
<li>The Fork/Join framework is designed to work with large tasks that can be split up into smaller tasks.</li>
<li>This is done through recursion, where you keep splitting up the task until you meet the base case, a task so simple that can be solved directly, and then combining all the partial results to compute the final result.</li>
<li>Splitting up the problem is know as <b>FORKING</b> and combining the results is known as <b>JOINING</b>.</li>
<li>The main class of the Fork/Join framework is <code>java.util.concurrent.ForkJoinPool</code>, which is actually a subclass of <code>ExecutorService</code>.</li>
<li>Just like an <code>ExecutorService</code> executes a task represented by a <code>Runnable</code> or a <code>Callable</code>, in the Fork/Join framework a task is represented by a subclass of either <code>RecursiveAction</code> (that <b>DOESN'T</b> return a value) or <code>RecursiveTask<V></code> (that <b>DOES</b> return a value).</li>
<li>However, unlike the worker threads that an <code>ExecutorService</code> uses, the threads of a <code>ForkJoinPool</code> use a work-stealing algorithm, which means that when a thread is free, it <b>STEALS</b> the pending work of other threads that are still busy doing other work.</li>
<li>A <code>ForkJoinTask</code> object has three main methods, <code>fork()</code>, <code>join()</code>, and <code>compute()</code>. The <b>ORDER</b> in which you call the methods is <b>IMPORTANT</b>.</li>
<li>You must first call <code>fork()</code> to queue the first subtask, then <code>compute()</code> on the second subtask to process it recursively, and then <code>join()</code> to get the result of the first subtask.</li>
</ul>
<h2>Self Test</h2>
<p>1. What of the following statements is true?<br /> A. <code>RecursiveAction</code> is a subclass of <code>ForkJoinPool</code>.<br /> B. When working with the Fork/Join framework, by default, one thread per CPU is created.<br /> C. You need to shut down a <code>ForkJoinPool</code> explicitly.<br /> D. <code>fork()</code> blocks the program until the result is ready.</p>
<p>2. Which of the following is the right order to call the methods of a <code>ForkJoinTask</code>?<br /> A. <code>compute()</code>, <code>fork()</code>, <code>join()</code><br /> B. <code>fork()</code>, <code>compute()</code>, <code>join()</code><br /> C. <code>join()</code>, <code>fork(</code>), <code>compute()</code><br /> D. <code>fork()</code>, <code>join()</code>, <code>compute()</code></p>
<p>3. When using a <code>RecursiveTask</code>, which of the following statements is true?<br /> A. You can use the <code>invokeAll()</code> method instead of the <code>fork()</code>/<code>join()</code>/<code>compute()</code> methods.<br /> B. You can use <code>ExecutorService</code> directly with this class.<br /> C. An action is triggered when the task is completed.<br /> D. <code>ForkJoinTask.invoke()</code> returns the same type as the generic type of <code>RecursiveTask</code>.</p>
<p>4. Given:</p>
<p><code class="java hljs"><span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Question_28_4</span> <span class="hljs-keyword">extends</span> <span class="hljs-title">RecursiveTask</span><<span class="hljs-title">Integer</span>></span> {<br />
<span class="hljs-keyword"> private</span> <span class="hljs-keyword">int</span> n;<br />
<br />
Question_28_4(<span class="hljs-keyword">int</span> n) {<br />
<span class="hljs-keyword"> this</span>.n = n;<br />
}<br />
<span class="hljs-function"><span class="hljs-keyword"><br />
public</span> Integer <span class="hljs-title">compute</span><span class="hljs-params">()</span></span> {<br />
<span class="hljs-keyword"> if</span> (n <= <span class="hljs-number">1</span>) {<br />
<span class="hljs-keyword"> return</span> n;<br />
}<br />
Question_28_4 t1 = <span class="hljs-keyword">new</span> Question_28_4(n - <span class="hljs-number">1</span>);<br />
Question_28_4 t2 = <span class="hljs-keyword">new</span> Question_28_4(n - <span class="hljs-number">2</span>);<br />
t1.fork();<br />
<span class="hljs-keyword"> return</span> t2.compute() + t1.join();<br />
}<br />
}</code></p>
<p>What is not right about this implementation of the Fork/Join framework?<br /> A. Everything is right, it's a perfect implementation of the Fork/Join framework.<br /> B. The order of the <code>fork()</code>, <code>join()</code>, <code>compute()</code> methods is not right.<br /> C. This implementation is very inefficient, the subtasks will be very small.<br /> D. It doesn't compile.</p>
<div class="answers">
<a href="ch28a.html" target="_blank">Open answers page</a>
</div>
<div class="book-info"></div>
<div class="linkbox">
<div class="previous">
<a href="ch27.html">27. Concurrency</a>
</div>
<div class="next">
<a href="ch29.html">29. JDBC API</a>
</div>
<div style="clear:both;"></div>
</div>
</div>
</div>
</body>
</html>