-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path00-README.conf
753 lines (582 loc) · 33.3 KB
/
00-README.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
Configuring OpenSAF
===================
NOTE: Upon installation of OpenSAF(through 'make install' or rpm install),
the $pkgsysconfdir/osafdir.conf (e.g. /etc/opensaf/osafdir.conf) shall contain
definitions of the directory paths where the OpenSAF packages are installed.
OpenSAF configuration files are located in the $pkgsysconfdir (e.g. /etc/opensaf)
directory.
======================================================
OpenSAF can be built to be run as the root or to
run as the UNIX system user "opensaf" (default behaviour).
Upon doing a 'make install' when OpenSAF is built to
run as the opensaf user, then the following steps have to be done:
Notes:
- These steps are not necessary if doing an rpm installation.
- If user wishes to run OpenSAF as the root user, then the below
steps can be ignored by building OpenSAF with the option
$./configure CPPFLAGS=-DRUNASROOT
0) groupadd -r opensaf
1) useradd -r -g opensaf -d /usr/local/share/opensaf/ -s /sbin/nologin -c "OpenSAF" opensaf
2) echo "opensaf ALL = NOPASSWD: /sbin/reboot, /sbin/shutdown, /sbin/tipc-config, /usr/bin/pkill, /usr/bin/killall" >> /etc/sudoers
3) echo 'Defaults:%opensaf !requiretty' >> /etc/sudoers
4) echo 'Defaults:opensaf !requiretty' >> /etc/sudoers
5) chown opensaf /var/lib/opensaf
6) chgrp opensaf /var/lib/opensaf
7) chown opensaf /var/run/opensaf
8) chgrp opensaf /var/run/opensaf
9) chown -R opensaf /var/log/opensaf
10) chgrp -R opensaf /var/log/opensaf
======================================================
Following is the brief description of each such configuration file.
*******************************************************************************
amfd.conf
This file contains configuration options for the AMF director. This file has to
be modified in the following cases:
OSAF_AMF_MIN_CLUSTER_SIZE should be set to the minimum number of nodes with
system controller capability that you wish to have in the system. AMF will
reject attempts to delete a node from the IMM configuration if the total number
of such nodes would fall below this configured limit.
*******************************************************************************
clmna.conf
This file contains configuration options for the CLM node agent. This file has
to be modified in the following cases:
CLMNA_ELECTION_DELAY_TIME should be set to the time in milliseconds without any
active or standby system controller node that OpenSAF will wait before starting
an election for a new active system controller. CLMNA will detect the presence
of active and standby system controllers in the cluster. If no active or standby
system controller has been seen during the configured time period, CLMNA will
initiate an election with the own node as a candidate for a new active system
controller. Note that lowering this value will increase the risk for
split-brain. Values lower than 5000 are not recommended for clusters with more
than two system controllers. For legacy installations with only two system
controller nodes, the default delay of 200 ms should be sufficient. For systems
with a very large number of configured system controller nodes and/or with
unreliable network connections, values larger than 5000 may be needed.
CLMNA_ADDR_FAMILY and CLMNA_ADDR_VALUE let you specify the values shown in the
saClmNodeCurrAddressFamily and saClmNodeCurrAddress runtime attributes of the
node's SaClmNode IMM object. If these options are not set, CLM will try to try
to figure out the node's address by itself. Since a node can have more than one
network address, the address chosen by CLM may not be the address your
application is interested in. By explicitly specifying the address using
CLMNA_ADDR_FAMILY and CLMNA_ADDR_VALUE, you will be guaranteed that the correct
address is presented.
*******************************************************************************
dtmd.conf
This file contains MDS/TCP configuration parameters. This file has to be edited
only when you chose TCP as the mode of transport.
The variable DTM_NODE_IP should be set to IP address of that local node
To use IP multicast or unicast instead of broadcast, the variable DTM_MCAST_ADDR
should be set to an IPv4/IPv6 multicast group address, or source of peer IP
addresses for unicast
* To use multicast, specify an IPv4 or IPv6 multicast group address
* To use unicast, the string should start with "file:" followed by the path
name of a file whose first line specifies the unique cluster name, followed
by a list of IP addresses (one per line), or start with "dns:" followed by a
DNS domain name, respectively. When using DNS, the domain name is used as
cluster name. Please ensure that the cluster name is unique, since it may be
used by OpenSAF to prevent accidential interconnection between clusters.
Examples:
DTM_MCAST_ADDR=224.0.0.1
DTM_MCAST_ADDR=ff02::1
DTM_MCAST_ADDR=file:/var/lib/peer_ip_addresses.txt
DTM_MCAST_ADDR=dns:peers.opensaf.org
*******************************************************************************
transportd.conf
This file contains the configuration for transportd Service. This file has to be
modified in the following cases:
(a) To override the default max log file size (5242880 bytes), enable the option
TRANSPORT_MAX_FILE_SIZE by removing # at the start of the line and update
its value to the required number of bytes. Suffixes k, M and G can be used
for kilo-, mega- and gigabytes.
(b) To override the default number of log backups (9 backup files), enable the
option TRANSPORT_MAX_BACKUPS by removing # at the start of the line and
update its value to the required number of backups.
*******************************************************************************
nid.conf
This file contains global configuration for OpenSAF. This file
has to be modified in the following cases:
(a) To configure/choose the OpenSAF transport as being TIPC or TCP:
As the default behaviour, OpenSAF shall use TCP as the transport.
When TIPC is selected as the MDS transport: change MDS_TRANSPORT to "TIPC".
Note: By building OpenSAF with the option $ ./configure --enable-tipc
(see also the OpenSAF README for build options) there is
flexibility to change(after stopping OpenSAF on all nodes)
the OpenSAF transport between TIPC and TCP without having to rebuild.
(b) Setting the network interface and tipc network id when TIPC is the transport:
When TIPC is selected as MDS transport, the TIPC network interface name TIPC_ETH_IF is
by default set to eth0 and the TIPC network Id TIPC_NET_ID is by default set to 1234.
These default values can be changed 'uniformly on all nodes' to
for eg:- TIPC_ETH_IF=eth3 and TIPC_NET_ID=4321.
Notes:
This TIPC_NET_ID and TIPC_ETH_IF has to be edited only when TIPC chosen as the mode of transport.
The interface name (TIPC_ETH_IF) and network id (TIPC_NET_ID) arguments to the configure_tipc.in
entry in nid.conf would get installed with default values. Please change this accordingly to
reflect your correct interface(s) and network-id information.
Example 1, single interface:
TIPC_ETH_IF=eth0
TIPC_NET_ID=1234
Example 2, dual interfaces:
TIPC_ETH_IF=eth0,eth1
TIPC_NET_ID=4321
In the above line, eth0 and eth1 refers to the interface information and 1234
refers to a sample tipc network id. Please change them accordingly. This
tipc-network id should be the same across all system controllers and payloads
in that OpenSAF cluster. The Ethernet interfaces is a comma separated list as
shown in the example 2 or just the name of a single interface as in
example 1.
Please note that OpenSAF recommend users to run TIPC on two separate bearers
and let TIPC do the link aggregation. A bonded interface is not recommended.
Also see http://tipc.sourceforge.net/doc/tipc_2.0_users_guide.html
A bonded interface is not recommended. However OpenSAF does not currently
support managing two separate TIPC bearers. The missing support is logged in
ticket http://devel.opensaf.org/ticket/1948
(d) OpenSAF should run as a different UNIX group and user than the default 'opensaf'
group/user.
If OpenSAF was built with the flags "CPPFLAGS=-DRUNASROOT", then
change OPENSAF_GROUP and OPENSAF_USER to root i.e. for old (<4.2) behaviour.
For any other user, change OPENSAF_GROUP and OPENSAF_USER accordingly
after doing the following:
If using RPMs, change the opensaf.spec.in file accordingly.
More specifically the variables opensaf_user and opensaf_group at the
top of the RPM spec file.
In the case of 'make install', do the steps listed in
'Steps to configure opensaf user' above.
(e) If use of MDS subslot ID needs to be enabled, add TIPC_USE_SUBSLOT_ID=YES
Note that the use of subslot ID is deprecated and should not be enabled
in new installations.
(f) Time supervision of local node reboot should be disabled or changed. Change
OPENSAF_REBOOT_TIMEOUT to the desired number of seconds before a reboot is
escalated to an immediate reboot via the SysRq interface, or zero to disable
this feature.
(g) User's non-OpenSAF applications are also using TIPC and the user does not
wants OpenSAF to manage(tipc configuration, insmod/rmmod) TIPC. In this case,
change OPENSAF_MANAGE_TIPC="no".
Note: When user has taken the responsibility to manage TIPC, then before
starting OpenSAF TIPC has to be configured according to OpenSAF requirements.
This can be done by (re)using the /usr/lib(64)/opensaf/configure_tipc script
as: $ configure_tipc start <interface name> <TIPC netid>
For eg:- $ configure_tipc start eth0 9859
(h) Setting MDS_TIPC_MCAST_ENABLED to 1 or 0, allows OpenSAF to enable or
disable TIPC Multicast Messaging and this configuration is valid when
MDS_TRANSPORT is set to TIPC. By Default TIPC Multicast Messaging is Enabled.
Note: In case of TIPC Multicast Messaging disabled (0), the performance
of OpenSAF will be considerably lower compared to Enabled (1).
(i) To use TIPC duplicate node address detection in cluster, while starting Opensaf
we needs to enabled TIPC_DUPLICATE_NODE_DETECT=YES in
`/usr/lib(64)/opensaf/configure_tipc` script.
Note: In OS like Montavista where TIPC is built in kernel module,
once TIPC has joined a network with a network-id & address
it is not possible to change node address until node reboots,
so duplicate node verification is not possible and enable disable
duplicate node verification (TIPC_DUPLICATE_NODE_DETECT=YES/NO)
has NO effect.
(j) When setting REBOOT_ON_FAIL_TIMEOUT to a number greater than 0, and systemd
is used to start OpenSAF, you should also set RestartSec in
opensafd.service to a number larger than REBOOT_ON_FAIL_TIMEOUT. This is to
avoid having systemd restart opensafd before the reboot takes effect.
*******************************************************************************
nodeinit.conf.<node_type>
This is the input file to the Node Initialization Daemon for serializing the
opensaf services startup.
<node_type> stands for 'controller' or 'payload'
By default `make install' will install pre-configured for a controller node, i.e.
the node_type file would contain the value 'controller'.
On a payload node, the node_type must be a changed to 'payload'
upon a 'make install'. For e.g.:
% cd $pkgsysconfdir (e.g. /etc/opensaf/)
% echo payload > node_type
*******************************************************************************
slot_id
The slot_id shall specify a unique value that represents a physical slot
identifier for the node in a chassis environment. It is typically set to the
same as the TIPC node ID.
When using TIPC transport, or TCP transport with IPv4 node addresses, the
slot_id file is not mandatory. If slot_id is missing or containts the value
zero, OpenSAF will use the node's TIPC address or IPv4 address as Node ID.
By default, the slot_id contains a value of 0, which means that the node's TIPC
or IPv4 address will be used as node_id. If you configure slot_id to some other
value than zero, you must you must make sure to use a unique slot_id value on
each node. You can configure e.g. a value of 2 as below:
% echo 2 > $pkgsysconfdir/slot_id
Starting with OpenSAF version 5.0, the maximum supported value for slot_id is
4095 when using the flat addressing scheme (i.e. when the use of subslot ID is
disabled). Prior to OpenSAF 5.0, the maximum supported value for slot_id was
255.
*******************************************************************************
subslot_id
The use of subslot ID is disabled by default. When disabled, subslot_id must be
set to 15 (which is the default value) on all nodes in the cluster.
If the subslot_id file is missing, OpenSAF will use the default value 15.
*******************************************************************************
chassis_id
The chassis_id represents an identifier for the chassis (Distributed Computing
Environment) and should be set to an integer value. It is normally never changed
from what's installed.
If the chassis_id file is missing, OpenSAF will use the default value 2.
*******************************************************************************
imm.xml
This is a generated file that contains the OpenSAF Information Model. It
contains the classes and configuration objects information of all OpenSAF
services.
By default based on the configuration options chosen, i.e. whether
a particular service is enabled or not, the imm.xml files of individual OpenSAF
service are installed in $pkgimmxmldir
(e.g. /usr/local/share/opensaf/immxml/services) directory.
Perform the following steps to generate the final/combined imm.xml. (Also
refer to the readme in $pkgimmxmldir (e.g. /usr/share/opensaf/immxml/) directory.
% cd $(pkgimmxmldir)
1) Generate a config file (nodes.cfg) for a 5 node cluster:
% sudo ./immxml-clustersize -s 2 -p 3
NOTE1: One can configure more than two system controller nodes as below:
% sudo ./immxml-clustersize -s 3 -p 2
In the above case, the configuration is generated for a 5 node cluster
with 2 system controller nodes elected as ACTIVE and STANDBY while
the 3rd SC will run as spare SC and 2 nodes will run as payloads.
NOTE2: One can configure all nodes as system controller nodes (with no payloads)
as below:
% sudo ./immxml-clustersize -s 5
In the above case, the configuration is generated for a 5 node cluster
with 2 system controller nodes elected as ACTIVE and STANDBY while
the remaining 3 nodes will run as spare system controller nodes.
(In this configuration, there are no payload nodes)
2) Edit nodes.cfg
The third column in nodes.cfg should be edited to match the short form of the
hostname (as shown by 'hostname -s') for each host.
3) Generate the imm.xml
% sudo ./immxml-configure
4) Copy the generated imm.xml file to the standard OpenSAF configuration directory.
For eg:
% sudo cp imm.xml.20100217_2201 /etc/opensaf/imm.xml
If the initial imm.xml is to be merged with some other file fragments the
'immxml-merge' tool is used as shown below:
% cd /usr/share/opensaf/immxml
% sudo ./immxml-merge imm.xml.20100217_2201 my.xml > merged.xml
% sudo cp merged.xml /etc/opensaf/imm.xml
*******************************************************************************
node_name
This file is generated with the short form of the hostname ('hostname -s') when
the controller or payload rpm is installed.
The node_name file contains the RDN value of the CLM node name. A user can
change the default value but the changed value 'must match' with the value
specified in the nodes.cfg that was used to generate the imm.xml.i.e. the
node_name value should match the value present in the generated imm.xml.
The node_name must be set to a unique value(as described above) for each node.
*******************************************************************************
node_type
This file is generated when the controller or payload rpm is installed and
contains the associated node type name "controller" or "payload". If
'make install' is used, this file will always contain "controller".
This is a read-only configuration file that never should be edited.
*******************************************************************************
Per process config file
Every OpenSAF process has its own configuration file for the purpose of
controlling low level features such as tracing, location of configuration
directories etc. Two files are worth mentioning since they probably needs
editing for at least a real deployment. For test and single node install
they work out of the box.
immnd.conf
This file is special since it contains many very important configuration
settings for the IMM service. See the immnd.conf file, the immsv README and the
IMM PR document for more information.
logd.conf
This file is worth mentioning since it contains the location of a
potentially shared between controllers directory where the LOG service stores
the generated log files.
*******************************************************************************
/etc/plmcd.conf
The plmcd.conf file has to be edited as below 'when PLM is enabled' while
building OpenSAF.
The following entries have to be modified accordingly:
[ee_id]
0020f
The ee_id field should contain the DN name of the EE hosting the OpenSAF controller
as specified by the SaPlmEE object.
For eg:-
IF the DN name of the EE object in the imm.xml is set to
... <object class="SaPlmEE">
... <dn>safEE=SC-1,safDomain=domain_1</dn>
THEN, the [ee_id] field should be set to
[ee_id]
safEE=SC-1,safDomain=domain_1
Note: Here SC-1 is the default value specified in the nodes.cfg file while generating
the imm.xml
[controller_1_ip]
10.105.1.3
[controller_2_ip]
10.105.1.6
The controller_1_ip and controller_2_ip fields shall contain the IP Addresses
of the peer controllers.
[os_type]
This is a description of the OS. Currently it needs to follow
# the following syntax, 1;os;Fedora;2.6.31, where
# 1 => Version
# os => Product
# Fedora => Vendor
# 2.6.31 => Release
# Note that Version is mandatory while the other fields can be empty
The [os_type] field should match with the values contained
in the imm.xml for that EE.
It shall contain the attribute values of the SaPlmEEBaseType
and SaPlmEEType objects, each separated by Semi-Colons.
For eg:-
IF The following is the version of the EE in the default imm.xml,
<object class="SaPlmEEType">
<dn>safVersion=1,safEEType=Linux_os</dn>
AND, the following is configured in the default imm.xml,
<object class="SaPlmEEBaseType">
<dn>safEEType=Linux_os</dn>
<attr>
<name>saPlmEetProduct</name>
<value>os</value>
</attr>
<attr>
<name>saPlmEetVendor</name>
<value>SUSE</value>
</attr>
<attr>
<name>saPlmEetRelease</name>
<value>2.6</value>
</attr>
</object>
THEN, the [os_type] field shall be set to the following value:
[os_type]
1;os;SUSE;2.6
*******************************************************************************
Priorities of OpenSAF threads:
==============================
The following list contains the default priority values of OpenSAF threads and
the corresponding environment variables could be used to specify them:
Note: The following are default values that are tested and supported by OpenSAF.
A system integrator/designer may choose to fine-tune these values upon their own
discretion(and as such are not for playing around).
Thread Name Environment variable Default settings
===============================================================================
MDS Agent OSAF_MDS_SCHED_PRIORITY 85
OSAF_MDS_SCHED_POLICY 2 (SCHED_RR)
MDS Timer OSAF_TMR_SCHED_PRIORITY 85
OSAF_TMR_SCHED_POLICY 2 (SCHED_RR)
DTM OSAF_DTM_INTRANODE_SCHED_PRIORITY 85
OSAF_DTM_INTRANODE_SCHED_POLICY 2 (SCHED_RR)
OSAF_NODE_DISCOVERY_SCHED_PRIORITY 85
OSAF_NODE_DISCOVERY_SCHED_POLICY 2 (SCHED_RR)
RDE Agent OSAF_RDA_SCHED_PRIORITY 0
OSAF_RDA_SCHED_POLICY 0 (SCHED_OTHER)
MQSV Agent OSAF_MQA_SCHED_PRIORITY 0
OSAF_MQA_SCHED_POLICY 0 (SCHED_OTHER)
OSAF_MQA_CLBK_SCHED_PRIORITY 0
OSAF_MQA_CLBK_SCHED_POLICY 0 (SCHED_OTHER)
Module Execution OSAF_EXEC_MOD_SCHED_PRIORITY 0
OSAF_EXEC_MOD_SCHED_POLICY 0 (SCHED_OTHER)
AMFND Monitoring OSAF_AVND_MON_SCHED_PRIORITY 0
OSAF_AVND_MON_SCHED_POLICY 0 (SCHED_OTHER)
SMF Campaign OSAF_SMF_CAMP_SCHED_PRIORITY 0
OSAF_SMF_CAMP_SCHED_POLICY 0 (SCHED_OTHER)
SMF Upgrade Procedure OSAF_SMF_UPGRADE_PROC_SCHED_PRIORITY 0
OSAF_SMF_UPGRADE_PROC_SCHED_POLICY 0 (SCHED_OTHER)
SMF Procedure OSAF_SMF_PROC_SCHED_PRIORITY 0
OSAF_SMF_PROC_SCHED_POLICY 0 (SCHED_OTHER)
Priorities of OpenSAF Processes:
================================
The following list contains the default priority values of OpenSAF processes
and the corresponding environment variables could be used to specify them:
Process Name Environment variable Default settings
===============================================================================
AMF Node Director OSAFAMFND_SCHED_PRIORITY 1
OSAFAMFND_SCHED_POLICY 2 (SCHED_RR)
AMF WatchDog OSAFAMFWD_SCHED_PRIORITY 1
OSAFAMFWD_SCHED_POLICY 2 (SCHED_RR)
AMF Director OSAFAMFD_SCHED_PRIORITY 0
OSAFAMFD_SCHED_POLICY 0 (SCHED_OTHER)
RDE Director OSAFRDED_SCHED_PRIORITY 0
OSAFRDED_SCHED_POLICY 0 (SCHED_OTHER)
FM Director OSAFFMD_SCHED_PRIORITY 0
OSAFFMD_SCHED_POLICY 0 (SCHED_OTHER)
IMM Director OSAFIMMD_SCHED_PRIORITY 0
OSAFIMMD_SCHED_POLICY 0 (SCHED_OTHER)
IMM Node Director OSAFIMMND_SCHED_PRIORITY 0
OSAFIMMND_SCHED_POLICY 0 (SCHED_OTHER)
LOG Director OSAFLOGD_SCHED_PRIORITY 0
OSAFLOGD_SCHED_POLICY 0 (SCHED_OTHER)
NTF Director OSAFNTFD_SCHED_PRIORITY 0
OSAFNTFD_SCHED_POLICY 0 (SCHED_OTHER)
CLM Director OSAFCLMD_SCHED_PRIORITY 0
OSAFCLMD_SCHED_POLICY 0 (SCHED_OTHER)
SMF Director OSAFSMFD_SCHED_PRIORITY 0
OSAFSMFD_SCHED_POLICY 0 (SCHED_OTHER)
SMF Node Director OSAFSMFND_SCHED_PRIORITY 0
OSAFSMFND_SCHED_POLICY 0 (SCHED_OTHER)
MQSv Director OSAFMSGD_SCHED_PRIORITY 0
OSAFMSGD_SCHED_POLICY 0 (SCHED_OTHER)
MQSv Node Director OSAFMSGND_SCHED_PRIORITY 0
OSAFMSGND_SCHED_POLICY 0 (SCHED_OTHER)
LCK Director OSAFLCKD_SCHED_PRIORITY 0
OSAFLCKD_SCHED_POLICY 0 (SCHED_OTHER)
LCK Node Director OSAFLCKND_SCHED_PRIORITY 0
OSAFLCKND_SCHED_POLICY 0 (SCHED_OTHER)
CKPT Director OSAFCKPTD_SCHED_PRIORITY 0
OSAFCKPTD_SCHED_POLICY 0 (SCHED_OTHER)
CKPT Node Director OSAFCKPTND_SCHED_PRIORITY 0
OSAFCKPTND_SCHED_POLICY 0 (SCHED_OTHER)
EDS Director OSAFEVTD_SCHED_PRIORITY 0
OSAFEVTD_SCHED_POLICY 0 (SCHED_OTHER)
Setting resource limits for user applications running as 'non root' user
=========================================================================
One MDS thread exists(in the application context) when applications use
OpenSAF services' APIs. By default, this MDS thread is required to
run with a real time priority (see OSAF_MDS_SCHED_PRIORITY above).
From OpenSAF 4.2 onwards, applications that do not run with 'root' privileges
are expected to configure the resource limits for their application processes
such that the MDS thread is able to set realtime priority for itself.
i.e. Users have to setup RLIMIT_RTPRIO using setrlimit() or may configure
'rtprio' in the /etc/security/limits.conf for their applications.
Note: If resource limits are not configured as above then the MDS thread
(running in application context) will by default have/set a non-realtime
priority with the scheduling policy of SCHED_OTHER.
SMF bundle check command
=========================================================================
SMF requires a command to be configured which is called to verify sw bundles invoked by the campaign.
The command is configured in the OpenSafSmfConfig attribute smfBundleCheckCmd.
From OpenSAF 4.4 the smfBundleCheckCmd command as a part of the prerequisites tests.
Interface: The specified command is given the DN of the SaSmfSwBundle instance to check as argument.
configuredCmd <swBundleDN>
e.g. myConfiguredCommand saSmfBundle=myBundle
The specified command must return 0 (zero) on success.
Note: The OpenSAF default implemetation is located at osaf/services/saf/smfsv/scripts/smf-bundle-check.
Earlier default implementation of the command (which was never called from SMF) requires a bundle
name as input parameter.
From OpenSAF 4.4 the input parameter shall be the DN of the bundle to check.
Extended SaNameT type
=========================================================================
The SaNameT type is deprecated will be replaced with string parameters in new
SAF APIs. As an intermediate solution, the extended format of the SaNameT type
can be used to pass string parameters to and from old SAF APIs as well, by
tunneling them through the SaNameT type. To enable the extended SaNameT
format, the application source code has to be compiled with the
SA_EXTENDED_NAME_SOURCE preprocessor macro defined, and the environment
variable SA_ENABLE_EXTENDED_NAMES must be set to the value 1 before the first
call to any SAF API function.
When the extended SaNameT format is enabled, the SA_MAX_NAME_LENGTH constant
must not be used, and the application must treat the SaNameT type as opaque
and not access any of its members directly. Instead, the saAisNameLend() and
saAisNameBorrow() access functions shall be used. The
SA_MAX_UNEXTENDED_NAME_LENGTH constant can be used to refer to the maximum
string length that can be stored in the unextended SaNameT type.
Configuring remote fencing support using STONITH
================================================
In a virtual environment STONITH can be used to do remote fencing of the
other system controller in case of "link loss" or the peer system controller is
"live hanging", this to avoid split-brains.
Node self-fencing can also be used if e.g. the active controller loses
connectivity to all other nodes in the cluster.
Here follows an example of how to install and configure using Ubuntu 14.04,
On each virtual node install the stonith package:
sudo apt-get install cluster-glue
The name of each virtual node should be the same as the CLM node name,
e.g. safNode=SC-2,safCluster=myClmCluster the virtual node name should be SC-2.
If TCP is used and if a firewall is used on the "hypervisor" host,
the TCP port 16509 has to be added to the firewall rule.
If SSH is used use ssh-keygen and generate ssh keys for each
virtual node.
To verify the installation virsh can be used, e.g:
virsh --connect=qemu+tcp://192.168.122.1/system list --all
Example of output:
Id Name State
----------------------------------------------------
2 SC-1 running
3 SC-2 running
4 PL-3 running
Update the fmd.conf file:
# The Promote active timer is set to delay the Standby controllers reboot request,
# as the Active controller probably also are requesting reboot of the standby.
# The resolution is in 10 ms units.
export FMS_PROMOTE_ACTIVE_TIMER=300
# Uncomment the next line to enable remote fencing
export FMS_USE_REMOTE_FENCING=1
Update the opensaf_reboot script:
# Uncomment these 4 lines and update accordingly
# See also documentation for STONITH
#export FMS_FENCE_CMD="stonith"
#export FMS_DEVICE_TYPE="external/libvirt"
#export FMS_HYPERVISOR_URI="qemu+tcp://192.168.122.1/system"
#export FMS_FENCE_ACTION="reset"
Environment variable OPENSAF_REBOOT_TIMEOUT can be set as a fallback if
the remote fencing fails and instead trigger a reboot of the local node.
The following environment variable can be set for any services that are using MDS:
export OSAF_MDS_WATCHDOG=1
When set, the MDS receive thread will be monitored for throughput latencies.
A message will be written if the latency is > 0.1 second, example below shows a latency of 1 second:
messages.1:Sep 12 13:09:26 SC-1 osafimmd[26732]: NO MDS timerfd expired 10 times
If the latency exceeds 4 seconds a sigalrm will be sent and the process will be aborted.
# To enable gcov run ./configure --enable-gcov
# In each daemon a thread will be created that listens to a default multicast group 239.0.0.1 port 4712.
# To change default, update /etc/init.d/opensafd setup_env function, example:
# IPv4:
# export OPENSAF_GCOV_MULTICAST_GROUP="224.0.0.1"
# IPv6:
# export OPENSAF_GCOV_MULTICAST_GROUP="ff02::1"
# export OPENSAF_GCOV_MULTICAST_PORT="4711"
# and if running in UML uncomment the line:
# echo 100 > /proc/sys/net/ipv4/igmp_max_memberships
# To collect gcov data compile and use program tools/devel/gcov_collect/osaf_gcov_dump.c.
# Check the MULTICAST_PORT and MULTICAST_GROUP settings are the same as multicast group and port
# above.
If clm adm command for cluster reboot is issued an environment variable
OPENSAF_CLUSTER_REBOOT_WAIT_TIME_SEC can be set in nid.conf,
(which is sourced by opensafd), to specify
the time to wait for nodes to be started, except for the active node.
Default is 3 seconds. A file, "clm_cluster_reboot_in_progress", is created
on each node, except on the active node. This file indicates that a cluster
reboot is in progress and all nodes needs to delay their start, this to give
the active a lead.
Split-Brain Prevention with Consensus Service
=============================================
OpenSAF implements split-brain prevention by utilizing a consensus service that
implements a replicated state machine. The consensus service uses quorum to
prevent state changes in network partitions that don't include more than half
of the nodes in the cluster. In network partitions containing
half of the nodes or less, the state is either read-only or unavailable.
Thus, it is important to keep in mind that the consensus service by itself
does not prevent the presence of multiple active system
controller nodes.
In the case where the network has been split up into partitions,
OpenSAF relies on some additional mechanism like fencing to
ensure that only one active controller exists among the network partitions.
If fencing is not available, the old active system controller can detect that it has
lost write access and step down from its active role.
The consensus service can be implemented, for example, using the RAFT algorithm.
When using RAFT, there are mainly three possibilities:
1. The RAFT servers run on the same nodes as OpenSAF
2. The RAFT servers run on a subset of the OpenSAF nodes
3. The RAFT servers run on an external set of nodes, outside of the
OpenSAF cluster
The consensus services relies on a plugin to communicate with a distributed
key-value store database. This plugin must still function according to the
API when the network has split up into partitions.
The plugin interface is defined in src/osaf/consensus/plugins/sample.plugin
An implementation for etcdv2 is provided. It assumes etcd is installed
and configured on all system controllers. In clusters where
there are only two system controllers, it is highly recommended to
configure etcd so it runs on at least three nodes to facilitate
a majority vote with failure tolerance.
Other implementations of a distributed key-value store service
can be used, provided as it implements the interface documented in sample.plugin
To enable split-brain prevention, edit fmd.conf and update accordingly:
export FMS_SPLIT_BRAIN_PREVENTION=1
export FMS_KEYVALUE_STORE_PLUGIN_CMD=/usr/local/lib/opensaf/etcd.plugin
As discussed, the key-value store does not need to reside on the same nodes
as OpenSAF. In such a configuration, an appropriate plugin that handles
the communication with a remotely located key-value store, must be provided.
If remote fencing is enabled, then it will be used to fence a node that the
consensus service believes should not be active. Otherwise, rded/amfd will
initiate a 'self-fencing' by rebooting the node, if it determines the node
should no longer be active according to the consensus service, to prevent
a split-brain situation.
TIPC receive queue utilization
==============================
Setting the environment variable MDS_RECVQ_STATS_LOG_FREQ_SEC in a service config
file enables TIPC receive queue utilisation statistics. The argument is how often the
statistics will be written to syslog.
Example amfd.conf:
export MDS_RECVQ_STATS_LOG_FREQ_SEC=5
then every 5 seconds a log record is written:
May 20 12:23:30 SC-1 local0.notice osafamfd[545]: NO TIPC receive queue utilization (in %): min: 3.86 max: 4.38 mean: 4.15 std dev: 0.18