grep -i "health report" -A 75 /var/log/corfu/corfu.9000.log
For Corfu-nonconfig: grep -i "health report" -A 75 /var/log/corfu-nonconfig/corfu.9040.log
You will see a health report's status and reason fields similar to the following:{
"status": "DOWN",
"reason": "Some of the services are not initialized",
...
Following this, you will see an initialization list:
"init": [
{
"name": "Layout Server",
"status": "UP",
"reason": "Initialization successful"
},
{
"name": "Sequencer",
"status": "UP",
"reason": "Initialization successful"
},
{
"name": "Clustering Orchestrator",
"status": "UP",
"reason": "Initialization successful"
},
...
This list will contain six different components: Layout Server, Sequencer, Clustering Orchestrator, Log Unit, Compactor, Failure Detector.
If any of these components have failed to start, it will be reflected in the status and reason in the init list of this component and the overall corfu status and reason.
The following is an example of a health report where Corfu comes up but the sequencer server is not starting correctly:
{
"status": "DOWN",
"reason": "Some of the services are not initialized",
"init": [
{
"name": "Log Unit",
"status": "UP",
"reason": "Initialization successful"
},
{
"name": "Layout Server",
"status": "UP",
"reason": "Initialization successful"
},
{
"name": "Clustering Orchestrator",
"status": "UP",
"reason": "Initialization successful"
},
{
"name": "Failure Detector",
"status": "UP",
"reason": "Initialization successful"
},
{
"name": "Sequencer",
"status": "DOWN",
"reason": "Service is not initialized"
}
],
grep -i "DataCorruptionException" /var/log/corfu/corfu.9000.log
For Corfu-nonconfig: grep -i "DataCorruptionException" /var/log/corfu-nonconfig/corfu.9040.log
If you see results from these commands, use one of the following KB articles to resolve the issue:NSX-T Manager service corfu-nonconfig-server is not running
Corfu data file corruption seen in corfu.9000.log: "Checksum mismatch detected while trying to read file" or "Can't parse metadata"
If you don't see these issues or require further assistance, open a case with Broadcom Support. For more information, see Creating and managing Broadcom support cases.
/config/corfu
directory (for Corfu) and /nonconfig/corfu
directory (for Corfu-non config) are writable and have sufficient free disk space.grep -i "health report" -A 75 /var/log/corfu/corfu.9000.log
grep -i "health report" -A 75 /var/log/corfu-nonconfig/corfu.9040.log
{
"status": "FAILURE",
"reason": "Some of the services experience runtime health issues",
...
"runtime": [
{
"name": "Log Unit",
"status": "UP",
"reason": "Up and running"
},
{
"name": "Layout Server",
"status": "UP",
"reason": "Up and running"
},
{
"name": "Clustering Orchestrator",
"status": "UP",
"reason": "Up and running"
},
...
This list will contain six different components: Layout Server, Sequencer, Clustering Orchestrator, Log Unit, Compactor, Failure Detector.
If any of these components encounters an issue during runtime, it will be reflected in the status and reason in this component's runtime list and the overall corfu status and reason.
The following is an example of a health report where the sequencer server is experiencing runtime health issues:
{
"status": "FAILURE",
"reason": "Some of the services experience runtime health issues",
...
"runtime": [
{
"name": "Log Unit",
"status": "UP",
"reason": "Up and running"
},
{
"name": "Layout Server",
"status": "UP",
"reason": "Up and running"
},
{
"name": "Clustering Orchestrator",
"status": "UP",
"reason": "Up and running"
},
{
"name": "Failure Detector",
"status": "UP",
"reason": "Up and running"
},
{
"name": "Sequencer",
"status": "FAILURE",
"reason": "Sequencer requires bootstrap"
}
Note: Layout Server cannot enter a DEGRADED state
This error indicates that corfu servers are running out of disk space. Once you see it, only the writes with high priority will go through and all others will be aborted with QuotaExceededException.
If the compactor is up and running in the health monitor logs, eventually the disk space will get reclaimed and this error will go away.
If the compactor is not running for more than 30 minutes ( "Last compaction cycle failed".), follow the instructions in the following articles:
NSX Manager cluster degraded and UI inaccessible/Compactor running Out Of Memory
NSX Manager cluster intermittently degraded due to Proton or Compactor running Out Of Memory
If you require further assistance, open a case with Broadcom Support. For more information, see Creating and managing Broadcom support cases.
If Sequencer is in FAILURE, you will see a message: "Sequencer requires bootstrap".
This is typically a temporary condition. Wait for 10 minutes or restart the Corfu servers.
For corfu: /etc/init.d/corfu-server restart
For Corfu-nonconfig: /etc/init.d/corfu-nonconfig-server restart
If the issue persists, open a case with Broadcom Support. For more information, see Creating and managing Broadcom support cases.
If Clustering Orchestrator or Failure Detector are in FAILURE, wait for 10 minutes or restart the corfu servers.
For corfu: /etc/init.d/corfu-server restart
For Corfu-nonconfig: /etc/init.d/corfu-nonconfig-server restart
If the issue persists, open a case with Broadcom Support. For more information, see Creating and managing Broadcom support cases.
If Compactor is in FAILURE, you will see a reason "Last compaction cycle failed".
Follow the instructions in the following articles:
NSX Manager cluster degraded and UI inaccessible/Compactor running Out Of Memory
NSX Manager cluster intermittently degraded due to Proton or Compactor running Out Of Memory
If the issue persists, open a case with Broadcom Support. For more information, see Creating and managing Broadcom support cases.
If any of the above scenarios are encountered, gather following information before engaging Broadcom support:
See Collect Support Bundles and Uploading files to cases on the Broadcom Support Portal for more information