Symptoms
1) All Openshift nodes appear in a NotReady status
oc get nodes
2) No issue with memory and disk in the nodes, verification done using
df -h
free -g
3) Restarting openshift node.service return a timeout error
systemctl restart origin-node.service
Result: Job for origin-node.service failed because a timeout was exceeded
4) Restarting openshift master api and controller returns an error
master-restart api
master-restart controllers
Result: Job for origin-master-api.service failed because a configured resource limit was exceeded.
DX OPERATIONAL INTELLIGENCE - 2x
DX APPLICATION PERFORMANCE MANAGEMENT 2x
Openshift 3.x Platform
Some certificates are not longer validate causing a TLS connectivity issue
oc get csr
result => some certificates are in pending status
1. Login using
oc login -u system:admin
2. Manually approve all certificates
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
3. Verify that nothing is pending
oc get csr
Note: you might need to run the command again.
4. Verify openshift cluster connectivity, all nodes should be in Ready status now
oc get nodes