AIOps - Installer fails with [elasticsearch-node-validate] [ERROR] No nodes were detected in the cluster.
search cancel

AIOps - Installer fails with [elasticsearch-node-validate] [ERROR] No nodes were detected in the cluster.

book

Article ID: 217700

calendar_today

Updated On:

Products

CA Application Performance Management (APM / Wily / Introscope) DX Operational Intelligence CA App Experience Analytics

Issue/Introduction

We have configured the Elastic Node as per documentation and show below:

kubectl get nodes --show-labels

..

Ready    node     28d   v1.21.1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,dxi-es-node=master-data-1,kubernetes.io/arch=amd64,kubernetes.io/hostname=<your-ELASTIC-servername>,kubernetes.io/os=linux,node-role.kubernetes.io/node=
...

However DX Platform fails to recognize the node as available. Why?

...
2021-06-14 13:00:57,169 INFO  - [image-prefix-validate}] [ OK ] Image prefix has been successfully validated.
2021-06-14 13:01:00,735 TRACE - POST: com.github.dockerjava.netty.WebTarget@9a3eb67f
2021-06-14 13:01:02,106 TRACE - POST: com.github.dockerjava.netty.WebTarget@3634d59d
2021-06-14 13:01:02,132 TRACE - POST: com.github.dockerjava.netty.WebTarget@7f0906c0
2021-06-14 13:01:02,261 INFO  - [registry-validate] [ OK ] Registry has been validated successfully
2021-06-14 13:01:02,265 TRACE - DELETE: com.github.dockerjava.netty.WebTarget@ee03a8b
2021-06-14 13:01:02,311 INFO  - [node-info] [ INFO ] Node <your-ELASTIC-servername> is unschedulable
...

2021-06-14 13:01:02,323 INFO  - [environment-validate] [ERROR] Environment Configuration verification failed.
2021-06-14 13:01:06,288 INFO  - [nginx-version] [ OK ] Nginx Ingress Controller version has been verified successfully
2021-06-14 13:01:22,192 INFO  - [node-count] Ready and schedulable nodes with dxi-es-node label count equals 0.
2021-06-14 13:01:22,192 INFO  - [node-count] Minimum one ready and schedulable node required with 'dxi-es-node' label.
2021-06-14 13:01:22,192 ERROR - [elasticsearch-node-validate] [ERROR] No nodes were detected in the cluster. 

Environment

DX Operational Intelligence 20.x
DX Application Performance Management 20.x
DX AXA 20.x

Cause

From "kubectl describe nodes" output:

...
Taints:             node-role.kubernetes.io/master:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity: <your-ELASTIC-servername>
..

The above output indicates that the node is set to unschedulable which prevents the scheduler from placing new pods onto that Node hence the installer consider the elastic node as unavailable.

Resolution

Untaint the Elastic server node(s) using the below command:

kubectl taint node <ELASTIC servername> node-role.kubernetes.io/master-

Additional Information

DX AIOPs Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/190815

Other external references:
https://stackoverflow.com/questions/55191980/remove-node-role-kubernetes-io-masternoschedule-taint
https://stackoverflow.com/questions/57803471/in-k8s-how-to-setup-to-demo-kubectl-taint-nodes-all-node-role-kubernetes-io-m