Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splunk Operator: indexers don't start if search-heads still starting #1390

Open
yaroslav-nakonechnikov opened this issue Oct 18, 2024 · 8 comments
Assignees
Labels

Comments

@yaroslav-nakonechnikov
Copy link

Please select the type of request

Bug

Tell us more

Describe the request

[yn@ip-100-65-8-59 /]$ kubectl get pods -n splunk-operator
NAME                                                  READY   STATUS    RESTARTS   AGE
splunk-43105-cluster-manager-0                        1/1     Running   0          19m
splunk-43105-license-manager-0                        1/1     Running   0          30m
splunk-c-43105-standalone-0                           1/1     Running   0          30m
splunk-e-43105-deployer-0                             0/1     Running   0          8m15s
splunk-e-43105-search-head-0                          0/1     Running   0          8m15s
splunk-e-43105-search-head-1                          0/1     Running   0          8m15s
splunk-e-43105-search-head-2                          /1     Running   0          8m15s
splunk-operator-controller-manager-58b545f67c-8rrhx   2/2     Running   0          31m

and then:

NAME                                                  READY   STATUS    RESTARTS   AGE
splunk-43105-cluster-manager-0                        1/1     Running   0          21m
splunk-43105-license-manager-0                        1/1     Running   0          32m
splunk-c-43105-standalone-0                           1/1     Running   0          32m
splunk-e-43105-deployer-0                             0/1     Running   0          11m
splunk-e-43105-search-head-0                          1/1     Running   0          11m
splunk-e-43105-search-head-1                          1/1     Running   0          11m
splunk-e-43105-search-head-2                          1/1     Running   0          11m
splunk-operator-controller-manager-58b545f67c-8rrhx   2/2     Running   0          34m
splunk-site3-43105-indexer-0                          0/1     Running   0          2m17s
splunk-site3-43105-indexer-1                          0/1     Running   0          2m17s
splunk-site3-43105-indexer-2                          0/1     Running   0          2m17s

this is unbelievable, and extremely strange that still, in 2.6.1 there is a dependency check between splunk search-heads and indexers!!!!

Expected behavior
Indexers should start without dependency of search-heads!

@yaroslav-nakonechnikov
Copy link
Author

yaroslav-nakonechnikov commented Oct 18, 2024

it was already reported before #1260, and then there were 2 calls, where i described why logic with dependency is broken for kubernetes deployment.

and now can test 2.6.1 and we still see, that part of platform can't be started just because of problematic logic.
old case: 3448046

@vivekr-splunk
Copy link
Collaborator

@yaroslav-nakonechnikov we will get back to you with regard to this issue.

@yaroslav-nakonechnikov
Copy link
Author

so, sadly, this is extremely painful, as there maybe issues like that:

FAILED - RETRYING: [localhost]: Initialize SHC cluster config (2 retries left).
FAILED - RETRYING: [localhost]: Initialize SHC cluster config (1 retries left).

TASK [splunk_search_head : Initialize SHC cluster config] **********************
fatal: [localhost]: FAILED! =>

{ "attempts": 60, "changed": false, "cmd": [ "/opt/splunk/bin/splunk", "init", "shcluster-config", "-auth", "admin:j3Q9SWJlLBOlc3RWejMnUb6e", "-mgmt_uri", "https://splunk-e-43345-search-head-1.splunk-e-43345-search-head-headless.splunk-operator.svc.cluster.local:8089", "-replication_port", "9887", "-replication_factor", "3", "-conf_deploy_fetch_url", "https://splunk-e-43345-deployer-service:8089", "-secret", "RNr25biFMA4Z3SUbXB3VGwW6", "-shcluster_label", "she_cluster" ], "delta": "0:00:00.806237", "end": "2024-10-31 08:05:54.588881", "rc": 24, "start": "2024-10-31 08:05:53.782644" }
STDERR:

WARNING: Server Certificate Hostname Validation is disabled. Please see server.conf/[sslConfig]/cliVerifyServerName for details.
Login failed

MSG:

non-zero return code

PLAY RECAP *********************************************************************
localhost : ok=132 changed=11 unreachable=0 failed=1 skipped=68 rescued=0 ignored=0

problem in that section, as i understand: https://github.com/splunk/splunk-ansible/blob/53a9a70897896e279b43478583b13256e75894a2/roles/splunk_search_head/tasks/search_head_clustering.yml#L6

and search heads in infinitive loop, which leads to none of indexers are started.

it happened on splunk-operator 2.6.1 and splunk 9.1.6

@yaroslav-nakonechnikov
Copy link
Author

extremly strange that standalone instance started without issues:

NAME                                                  READY   STATUS    RESTARTS      AGE
splunk-43345-cluster-manager-0                        1/1     Running   1 (70m ago)   79m
splunk-43345-license-manager-0                        1/1     Running   0             79m
splunk-c-43345-standalone-0                           1/1     Running   0             79m
splunk-e-43345-deployer-0                             0/1     Running   0             66m
splunk-e-43345-search-head-0                          0/1     Running   3 (14m ago)   65m
splunk-e-43345-search-head-1                          0/1     Running   3 (14m ago)   65m
splunk-e-43345-search-head-2                          0/1     Running   3 (14m ago)   65m
splunk-operator-controller-manager-5c684d667d-smgdq   2/2     Running   0             80m

@yaroslav-nakonechnikov
Copy link
Author

and with this test i can confirm, that 9.1.6 is not working at all.

@yaroslav-nakonechnikov
Copy link
Author

with splunk-operator 2.7.0 i see, that indexers are really starting, but they can't finish boot, because... there is a depedency of cluster master :)

[yn@ip-100-65-11-122 /]$ kubectl logs splunk-site1-44782-indexer-0 -n splunk-operator
WARNING: No password ENV var.  Stack may fail to provision if splunk.password is not set in ENV or a default.yml

PLAY [Run default Splunk provisioning] *****************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [Execute pre-setup playbooks] *********************************************
included: /opt/ansible/execute_adhoc_plays.yml for localhost => (item=file:///mnt/pre-tasks/pre_tasks_indexer.yml)

TASK [Fetch adhoc playbooks] ***************************************************
changed: [localhost]

TASK [Execute playbooks] *******************************************************
included: /opt/container_artifact/pre_tasks_indexer.yml for localhost

TASK [pre check apps dir] ******************************************************
ok: [localhost]

TASK [download and unarchive] **************************************************
changed: [localhost]

TASK [Remove existing files] ***************************************************
changed: [localhost]

TASK [Create symbolic link from /mnt/var-run to /opt/splunk/var/run] ***********
changed: [localhost]

TASK [Provision role] **********************************************************

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/get_facts.yml for localhost

TASK [splunk_common : Set privilege escalation user] ***************************
ok: [localhost]

TASK [splunk_common : Check for scloud] ****************************************
ok: [localhost]

TASK [splunk_common : Check for existing installation] *************************
ok: [localhost]

TASK [splunk_common : Set splunk install fact] *********************************
ok: [localhost]

TASK [splunk_common : Check for existing splunk secret] ************************
ok: [localhost]

TASK [splunk_common : Set first run fact] **************************************
ok: [localhost]

TASK [splunk_common : Set splunk_build_type fact] ******************************
included: /opt/ansible/roles/splunk_common/tasks/get_facts_build_type.yml for localhost

TASK [splunk_common : Set target version fact] *********************************
included: /opt/ansible/roles/splunk_common/tasks/get_facts_target_version.yml for localhost

TASK [splunk_common : Find manifests] ******************************************
ok: [localhost]

TASK [splunk_common : Set current version fact] ********************************
ok: [localhost]

TASK [splunk_common : Setting upgrade fact] ************************************
ok: [localhost]

TASK [splunk_common : Setting indexer cluster fact from config] ****************
ok: [localhost]

TASK [splunk_common : Setting search head cluster fact from config] ************
ok: [localhost]

TASK [splunk_common : Detect service name] *************************************
included: /opt/ansible/roles/splunk_common/tasks/get_facts_service_name.yml for localhost

TASK [splunk_common : Setting service_name fact from config] *******************
ok: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/install_python_requirements.yml for localhost

TASK [splunk_common : Check if requests_unixsocket exists] *********************
changed: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/update_etc.yml for localhost

TASK [splunk_common : Check if /sbin/updateetc.sh exists] **********************
ok: [localhost]

TASK [splunk_common : Update /opt/splunk/etc] **********************************
ok: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/remove_first_login.yml for localhost

TASK [splunk_common : Create .ui_login] ****************************************
ok: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/set_splunk_secret.yml for localhost

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/set_general_symmkey_password.yml for localhost

TASK [splunk_common : Set general pass4SymmKey] ********************************
changed: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/trigger_restart.yml for localhost

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/get_splunk_status.yml for localhost

TASK [splunk_common : Restrict permissions on splunk.key for Status] ***********
included: /opt/ansible/roles/splunk_common/tasks/restrict_permissions.yml for localhost => (item=/opt/splunk/var/lib/splunk/kvstore/mongo/splunk.key)

TASK [splunk_common : Check if /opt/splunk/var/lib/splunk/kvstore/mongo/splunk.key exists] ***
ok: [localhost]

TASK [splunk_common : Restrict permissions on /opt/splunk/var/lib/splunk/kvstore/mongo/splunk.key] ***
ok: [localhost]

TASK [splunk_common : Get Splunk status] ***************************************
ok: [localhost]

TASK [splunk_common : Trigger restart] *****************************************
ok: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/enable_admin_auth.yml for localhost

TASK [splunk_common : Apply admin password] ************************************
ok: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/trigger_restart.yml for localhost

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/get_splunk_status.yml for localhost

TASK [splunk_common : Restrict permissions on splunk.key for Status] ***********
included: /opt/ansible/roles/splunk_common/tasks/restrict_permissions.yml for localhost => (item=/opt/splunk/var/lib/splunk/kvstore/mongo/splunk.key)

TASK [splunk_common : Check if /opt/splunk/var/lib/splunk/kvstore/mongo/splunk.key exists] ***
ok: [localhost]

TASK [splunk_common : Restrict permissions on /opt/splunk/var/lib/splunk/kvstore/mongo/splunk.key] ***
ok: [localhost]

TASK [splunk_common : Get Splunk status] ***************************************
ok: [localhost]

TASK [splunk_common : Trigger restart] *****************************************
ok: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/configure_mgmt_port.yml for localhost

TASK [splunk_common : set version fact] ****************************************
ok: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/pre_splunk_start_commands.yml for localhost

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/enable_s2s.yml for localhost

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/s2s/configure_splunktcp.yml for localhost

TASK [splunk_common : Enable splunktcp input] **********************************
ok: [localhost]

TASK [splunk_common : Remove splunktcp-ssl input] ******************************
ok: [localhost]

TASK [splunk_common : Remove input SSL settings] *******************************
ok: [localhost]

TASK [splunk_common : Reset root CA] *******************************************
ok: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/set_mgmt_port.yml for localhost

TASK [splunk_common : Set mgmt port] *******************************************
ok: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/enable_splunkd_ssl.yml for localhost

TASK [splunk_common : Enable Splunkd SSL] **************************************
ok: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/set_config_file.yml for localhost => (item=(censored due to no_log))
included: /opt/ansible/roles/splunk_common/tasks/set_config_file.yml for localhost => (item=(censored due to no_log))
included: /opt/ansible/roles/splunk_common/tasks/set_config_file.yml for localhost => (item=(censored due to no_log))

TASK [splunk_common : Create /opt/splunk/etc/system/local directory] ***********
ok: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/set_config_stanza.yml for localhost => (item=(censored due to no_log))

TASK [splunk_common : Set options in imds] *************************************
changed: [localhost] => (item={'key': 'imds_version', 'value': 'v2'})

TASK [splunk_common : Create /opt/splunk/etc/system/local directory] ***********
ok: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/set_config_stanza.yml for localhost => (item=(censored due to no_log))

TASK [splunk_common : Set options in settings] *********************************
ok: [localhost] => (item={'key': 'enableSplunkWebSSL', 'value': True})

TASK [splunk_common : Create /opt/splunk/etc/system/local directory] ***********
ok: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/set_config_stanza.yml for localhost => (item=(censored due to no_log))
included: /opt/ansible/roles/splunk_common/tasks/set_config_stanza.yml for localhost => (item=(censored due to no_log))
included: /opt/ansible/roles/splunk_common/tasks/set_config_stanza.yml for localhost => (item=(censored due to no_log))

TASK [splunk_common : Set options in authentication] ***************************
ok: [localhost] => (item={'key': 'authSettings', 'value': 'saml'})
ok: [localhost] => (item={'key': 'authType', 'value': 'SAML'})

TASK [splunk_common : Set options in saml] *************************************
ok: [localhost] => (item={'key': 'entityId', 'value': 'splunkACSEntityId'})
ok: [localhost] => (item={'key': 'fqdn', 'value': 'https://cmi.44782.dev.internal.prjgroup.cloud'})
ok: [localhost] => (item={'key': 'idpSSOUrl', 'value': 'https://login.prjnternational.com/idp/SSO.saml2'})
ok: [localhost] => (item={'key': 'inboundDigestMethod', 'value': 'SHA1;SHA256;SHA384;SHA512'})
ok: [localhost] => (item={'key': 'inboundSignatureAlgorithm', 'value': 'RSA-SHA1;RSA-SHA256;RSA-SHA384;RSA-SHA512'})
ok: [localhost] => (item={'key': 'issuerId', 'value': 'idp:prjnternational.com:saml2'})
ok: [localhost] => (item={'key': 'lockRoleToFullDN', 'value': True})
ok: [localhost] => (item={'key': 'redirectAfterLogoutToUrl', 'value': 'https://www.splunk.com'})
ok: [localhost] => (item={'key': 'redirectPort', 'value': 443})
ok: [localhost] => (item={'key': 'replicateCertificates', 'value': True})
ok: [localhost] => (item={'key': 'signAuthnRequest', 'value': True})
ok: [localhost] => (item={'key': 'signatureAlgorithm', 'value': 'RSA-SHA1'})
ok: [localhost] => (item={'key': 'signedAssertion', 'value': True})
ok: [localhost] => (item={'key': 'sloBinding', 'value': 'HTTP-POST'})
ok: [localhost] => (item={'key': 'ssoBinding', 'value': 'HTTP-POST'})
ok: [localhost] => (item={'key': 'clientCert', 'value': '/mnt/certs/saml_sig.pem'})
ok: [localhost] => (item={'key': 'idpCertPath', 'value': '/mnt/certs/'})

TASK [splunk_common : Set options in roleMap_SAML] *****************************
ok: [localhost] => (item={'key': 'admin', 'value': 'prj-aws-s-eng-admin;prj-aws-s-eng-admin'})
ok: [localhost] => (item={'key': 'cloudgateway', 'value': 'prj-aws-s-adm'})
ok: [localhost] => (item={'key': 'dashboard', 'value': 'prj-aws-s-dashboard'})
ok: [localhost] => (item={'key': 'ess_admin', 'value': 'prj-aws-s-adm;prj-aws-s-eng'})
ok: [localhost] => (item={'key': 'ess_analyst', 'value': 'atri_prj_aws_splunk_user;prj-aws-s-eng;prj_s_eng'})
ok: [localhost] => (item={'key': 'ess_user', 'value': 'prj-aws-s-user;prj_s_user'})
ok: [localhost] => (item={'key': 'phantom', 'value': 'prj-aws-s-phantom-admin'})
ok: [localhost] => (item={'key': 'prj_headoffice_power', 'value': 'prj-aws-s-eng;prj-aws-s-ho-power'})
ok: [localhost] => (item={'key': 'prj_headoffice_user', 'value': 'prj-aws-s-ho-user;prj-aws-s-user'})
ok: [localhost] => (item={'key': 'prj_proxy_user', 'value': 'prj-aws-s-user-only-prj-proxy'})
ok: [localhost] => (item={'key': 'splunk_cst_admin', 'value': 'prj-aws-s-cst-admin'})
ok: [localhost] => (item={'key': 'splunk_cst_power', 'value': 'prj-aws-s-cst-power'})
ok: [localhost] => (item={'key': 'splunk_cst_usecase', 'value': 'prj-aws-s-cst-usecase'})
ok: [localhost] => (item={'key': 'splunk_eng_admin', 'value': 'aws-s-eng-admin;prj-aws-s-eng-admin'})
ok: [localhost] => (item={'key': 'splunk_eng_power', 'value': 'aws-s-eng-power'})
ok: [localhost] => (item={'key': 'splunk_soc_l1_l2', 'value': 'aws-s-soc-l1-l2'})
ok: [localhost] => (item={'key': 'splunk_soc_l3', 'value': 'aws-s-soc-l3'})
ok: [localhost] => (item={'key': 'tc_admin', 'value': _aws_splunk_user;aws-s-adm;aws-s-eng'})
ok: [localhost] => (item={'key': 'tc_user', 'value': 'aws_splunk_user;aws-s-adm;aws-s-eng'})

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/start_splunk.yml for localhost

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/get_splunk_status.yml for localhost

TASK [splunk_common : Restrict permissions on splunk.key for Status] ***********
included: /opt/ansible/roles/splunk_common/tasks/restrict_permissions.yml for localhost => (item=/opt/splunk/var/lib/splunk/kvstore/mongo/splunk.key)

TASK [splunk_common : Check if /opt/splunk/var/lib/splunk/kvstore/mongo/splunk.key exists] ***
ok: [localhost]

TASK [splunk_common : Restrict permissions on /opt/splunk/var/lib/splunk/kvstore/mongo/splunk.key] ***
ok: [localhost]

TASK [splunk_common : Get Splunk status] ***************************************
ok: [localhost]

TASK [splunk_common : Cleanup Splunk runtime files] ****************************
ok: [localhost] => (item=/opt/splunk/var/run/splunk/splunkd.pid)
changed: [localhost] => (item=/opt/splunk/var/lib/splunk/kvstore/mongo/mongod.lock)

TASK [splunk_common : Restrict permissions on splunk.key] **********************
included: /opt/ansible/roles/splunk_common/tasks/restrict_permissions.yml for localhost => (item=/opt/splunk/var/lib/splunk/kvstore/mongo/splunk.key)

TASK [splunk_common : Check if /opt/splunk/var/lib/splunk/kvstore/mongo/splunk.key exists] ***
ok: [localhost]

TASK [splunk_common : Restrict permissions on /opt/splunk/var/lib/splunk/kvstore/mongo/splunk.key] ***
ok: [localhost]

TASK [splunk_common : Start Splunk via CLI] ************************************
changed: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/check_uds_file.yml for localhost

TASK [splunk_common : Check if UDS file exists] ********************************
ok: [localhost]

TASK [splunk_common : Set UDS enabled/disabled] ********************************
ok: [localhost]

TASK [splunk_common : Wait for splunkd management port] ************************
ok: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/set_certificate_prefix.yml for localhost
FAILED - RETRYING: [localhost]: Test basic https endpoint (60 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (59 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (58 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (57 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (56 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (55 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (54 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (53 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (52 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (51 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (50 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (49 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (48 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (47 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (46 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (45 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (44 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (43 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (42 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (41 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (40 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (39 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (38 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (37 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (36 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (35 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (34 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (33 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (32 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (31 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (30 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (29 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (28 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (27 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (26 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (25 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (24 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (23 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (22 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (21 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (20 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (19 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (18 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (17 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (16 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (15 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (14 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (13 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (12 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (11 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (10 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (9 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (8 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (7 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (6 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (5 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (4 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (3 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (2 retries left).
FAILED - RETRYING: [localhost]: Test basic https endpoint (1 retries left).

TASK [splunk_common : Test basic https endpoint] *******************************
fatal: [localhost]: FAILED! => {
    "attempts": 60,
    "changed": false,
    "elapsed": 10,
    "failed_when_result": true,
    "redirected": false,
    "status": -1,
    "url": "https://127.0.0.1:8089"
}

MSG:

Status code was -1 and not [200, 404]: Request failed: <urlopen error _ssl.c:1116: The handshake operation timed out>
...ignoring

TASK [splunk_common : Set url prefix for future REST calls] ********************
ok: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/clean_user_seed.yml for localhost

TASK [splunk_common : Remove user-seed.conf] ***********************************
ok: [localhost]

TASK [splunk_common : include_tasks] *******************************************
included: /opt/ansible/roles/splunk_common/tasks/add_splunk_license.yml for localhost

TASK [splunk_common : Initialize licenses array] *******************************
ok: [localhost]

TASK [splunk_common : Determine available licenses] ****************************
ok: [localhost] => (item=splunk.lic)

TASK [splunk_common : Set as license slave] ************************************
included: /opt/ansible/roles/splunk_common/tasks/set_as_license_slave.yml for localhost

TASK [splunk_common : Wait for the license master] *****************************
included: /opt/ansible/roles/splunk_common/tasks/wait_for_splunk_instance.yml for localhost => (item=(censored due to no_log))

TASK [splunk_common : Check remote Splunk instance is running] *****************
ok: [localhost]
[yn@ip-100-65-11-122 /]$ kubectl exec -it splunk-site1-44782-indexer-0 -n splunk-operator -- bash
[splunk@splunk-site1-44782-indexer-0 splunk]$ tail var/log/splunk/splunkd.log
01-24-2025 13:10:57.904 +0000 INFO  BundlesUtil [1650 MainThread] - Using manager-apps over master-apps, using: /opt/splunk/etc/manager-apps
01-24-2025 13:10:57.904 +0000 INFO  SpecFiles [1650 MainThread] - Found external scheme definition for stanza="powershell://" from spec file="/opt/splunk/etc/system/README/inputs.conf.spec" with parameters="script, schedule"
01-24-2025 13:10:57.904 +0000 INFO  BundlesUtil [1650 MainThread] - Using manager-apps over master-apps, using: /opt/splunk/etc/manager-apps
01-24-2025 13:10:57.904 +0000 INFO  SpecFiles [1650 MainThread] - Found external scheme definition for stanza="splunktcptoken://" from spec file="/opt/splunk/etc/system/README/inputs.conf.spec" with parameters="token"
01-24-2025 13:10:57.906 +0000 INFO  CMSlave [1650 MainThread] - starting heartbeat thread
01-24-2025 13:10:57.906 +0000 INFO  CMServiceThread [1821 CMHeartbeatThread] - CMHeartbeatThread starting eloop
01-24-2025 13:10:57.906 +0000 INFO  BucketReplicator [1650 MainThread] - Initializing BucketReplicatorMgr
01-24-2025 13:10:57.907 +0000 INFO  CMServiceThread [1824 CMHealthManager] - CMHealthManager starting eloop
01-24-2025 13:10:57.907 +0000 INFO  CMConfig [1650 MainThread] - ack_factor=0
01-24-2025 13:10:57.965 +0000 WARN  CMMasterProxy [1650 MainThread] - The cluster manager is down! Make sure pass4SymmKey is matching if the cluster manager is running.
[yn@ip-100-65-11-122 /]$ kubectl get pod -n splunk-operator
NAME                                                  READY   STATUS    RESTARTS   AGE
splunk-44782-license-manager-0                        1/1     Running   0          47m
splunk-c-44782-standalone-0                           1/1     Running   0          47m
splunk-e-44782-deployer-0                             0/1     Running   0          47m
splunk-e-44782-search-head-0                          1/1     Running   0          47m
splunk-e-44782-search-head-1                          1/1     Running   0          47m
splunk-e-44782-search-head-2                          1/1     Running   0          47m
splunk-operator-controller-manager-86b9c56f7f-46sgp   2/2     Running   0          48m
splunk-site1-44782-indexer-0                          0/1     Running   0          47m
splunk-site1-44782-indexer-1                          0/1     Running   0          47m
splunk-site1-44782-indexer-2                          0/1     Running   0          47m
splunk-site2-44782-indexer-0                          0/1     Running   0          47m
splunk-site2-44782-indexer-1                          0/1     Running   0          47m
splunk-site2-44782-indexer-2                          0/1     Running   0          47m
splunk-site3-44782-indexer-0                          0/1     Running   0          47m
splunk-site3-44782-indexer-1                          0/1     Running   0          47m
splunk-site3-44782-indexer-2                          0/1     Running   0          47m
splunk-site4-44782-indexer-0                          0/1     Running   0          47m
splunk-site4-44782-indexer-1                          0/1     Running   0          47m
splunk-site4-44782-indexer-2                          0/1     Running   0          47m
splunk-site5-44782-indexer-0                          0/1     Running   0          47m
splunk-site5-44782-indexer-1                          0/1     Running   0          47m
splunk-site5-44782-indexer-2                          0/1     Running   0          47m

i thought you know about it, and you will remove that block form CM as well.

@yaroslav-nakonechnikov
Copy link
Author

and deployer also looking for the cm, in its logs it is possible to get something like:

01-24-2025 13:56:54.399 +0000 ERROR IndexerDiscoveryHeartbeatThread [5795 TcpOutEloop] - Error in Indexer Discovery communication. Verify that the pass4SymmKey set under [indexer_discovery:group1] in 'outputs.conf' matches the same setting  under [indexer_discovery] in 'server.conf' on the cluster manager. [uri=https://splunk-44782-cluster-manager-service:8089/services/indexer_discovery http_code=502 http_response="Error connecting: Connection refused"]

@yaroslav-nakonechnikov
Copy link
Author

even more, it deletes CM pod, if deployer is not started! which is absolutely non-understandable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants