CorfuDB issue symptoms:
VMware NSX
When troubleshooting CorfuDB, start with the following:
root@nsx-mngr-01:~# df -h
cat /config/corfu/LAYOUT_CURRENT.ds
"layoutServers": [
"###.###.###.###:9000",
"###.###.###.###:9000",
"###.###.###.###:9000"
],
"sequencers": [
"###.###.###.###:9000",
"###.###.###.###:9000",
"###.###.###.###:9000"
],
"segments": [
{
"replicationMode": "CHAIN_REPLICATION",
"start": 0,
"end": -1,
"stripes": [
{
"logServers": [
"###.###.###.###:9000",
"###.###.###.###:9000",
"###.###.###.###:9000"
]
}
]
}
],
"unresponsiveServers": [],
"epoch": 143
"clusterId": "<UUID>"
}
root@nsx-mngr-01:~# ls -ltr /config/corfu/LAYOUT*
root@nsx-mngr-01:~# service corfu-server status
Online Diagnostic System (ODS) Documentation: Debugging NSX at Runtime
ODS CLI command:
nsx-mngr-01> get runbook CorfuServer help
nsx-mngr-01> get runbook CorfuServer help
Mon Oct 02 2023 UTC 15:40:14.189
Runbook ID : CorfuServer
Descrption : Corfu Server runbook to find server side issues.
Parameters
Name : lookback_days
Title : Specify a time window
Constraint : <integer>
Default : 1
Name : lookback_hours
Title : Specify a time window
Constraint : <integer>
Default : 0
nsx-mngr-01> start invocation runbook CorfuServer runbook-arg --lookback_days <NUM_OF_DAYS> --lookback_hours <NUM_OF_HOURS>
nsx-mngr-01> start invocation runbook CorfuServer runbook-arg --lookback_days 2 --lookback_hours 8
Runbook Invocation Report
Invocation ID : 72fab7c6-####-####-####-488fbdc34bdb
Timestamp : 2023-10-02 15:43:14
System Info
Host Name : nsx-mngr-01
OS Name : Linux
OS Version : 5.15.92-nn12-server
Arch : x86_64
Runbook Info
Runbook ID : CorfuServer
Version : 1.0
Publisher : VMware, Inc.
Report Type : VALID
Conclusion : Finished running the CorfuServer Runbook.
Recommendation : If there is any failure in the runbook steps, please collect the support bundles and reach out to the support team <https://www.vmware.com/support.html>.
Artifact Bundle : <none>
Steps
Step Number : 1
Step Action : This step checks Corfu Layout changes in the given time window (default is 24h)
Step Result : The result of the Corfu Layout Check is {'result': <Result.SUCCESS: 'SUCCESS'>, 'message': 'Layout changes are normal. Found 0 layout changes during the last 56.0 hours. (Thresholds are bad_node_unresponsive_percentage: 50%, unstable_layout_changes_per_hour: 10).', 'data': {'detected_layout_changes_per_hour': 0.0}}
Step Number : 2
Step Action : Check /var/log/stats/ping.stats.
Step Result : The result of the Infra Ping Check is {'result': <Result.SUCCESS: 'SUCCESS'>, 'message': 'Infra ping stats are normal (below thresholds packet_loss_threshold_percentage: 30%, avg_rtt_threshold_ms: 10).', 'data': '{}'}
Step Number : 3
Step Action : Check /var/log/stats/sys_threads.stats and analyze CPU load average in the given time window (default is 24h)
Step Result : The result of the Infra Load Average Check is {'result': <Result.SUCCESS: 'SUCCESS'>, 'message': 'Infra load averages are normal (below threshold 20).', 'data': '{}'}
Step Number : 4
Step Action : Check trim token movement in the given time window (default is 24h)
Step Result : The result of the Corfu Trim Token Movement Check is {'result': <Result.SUCCESS: 'SUCCESS'>, 'message': 'Detected a successful log trim.', 'data': {'last_trim_date': '2023-10-02 15:38:44.854000+00:00'}}
Step Number : 5
Step Action : Check fsync latency metrics in the given time window (default is 24h)
Step Result : The result of the Corfu Fsync Latency Metrics Check is {'result': <Result.SUCCESS: 'SUCCESS'>, 'message': "Corfu metrics fsync disk latencies are normal (below thresholds {'0.5': '150000', '0.75': '175000', '0.95': '195000', '0.99': '200000'}).", 'data': '{}'}
Step Number : 6
Step Action : Check failure detector ping latency metrics in the given time window (default is 24h).
Step Result : The result of the Corfu Failure Detector Ping Latency Metrics Check is {'result': <Result.SUCCESS: 'SUCCESS'>, 'message': 'Corfu failure detector ping latencies are normal (below threshold 200.0ms).', 'data': '{}'}
Known Issues:
Handling Log Bundles for offline review with Broadcom support:
Collect Support Bundles for Troubleshooting NSX-T
Uploading files to cases on the Broadcom Support Portal
Creating and managing Broadcom support cases