In VMware vSphere Hypervisor (ESXi) from versions 7.0U3 to 7.0U3d, an issue can be observed of IO failures with fsync error messages when Greenplum is experiencing heavy workload. This has been observed on both VMware vSAN and Dell PowerFlex based solutions. The only known resolution is upgrading or otherwise avoiding the impacted versions of ESXi. Refer to the VMware interoperability matrix Product Interoperability Matrix (vmware.com) for the VMware Greenplum and supported, compatible versions of ESXi.
Queries in Greenplum will occasionally fail with various errors and locations. It can be the same with various Greenplum utilities.
When there are failures, you can find in segment logs an fsync failure that looks like this:
2023-03-16 08:20:26.660006 EDT,"gpadmin","backup_test",p2499427,th1383559808,"172.28.10.10","46996",2023-03-16 08:14:54 EDT,1374545,con170970,cmd8691,seg34,,dx590001,x13745 45,sx1,"PANIC","58030","could not fsync file ""base/8518984/3810710"": Input/output error",,,,,,"[REDACTED]",0,,"md.c 11",1098,"Stack trace:
At the virtual machine, guest OS level, a kernel failure can observed that looks like this:
16 15:19:45 EDGDVIGPDD093 trace-agent [30854851: 2023-03-16 15:19:45 EDT | TRACE | INFO (pkg/trace/info/stats.go:104 in LogStats) | No data receive 16 15:19:46 EDGDVLGPDD093 kernel: sd 0:0:1:0: [sdb] tag#78 FAILED Result: hostbyte=DID ERROR driverbyte=DRIVER_OK cd_age=0s 16 15:19:46 EDGDIGPDD093 kernel: sd 0:0:1:0: [sdb] tag#78 CDB: Write (10) 2a 00 97 3c d5 30 00 08 00 00 16 15:19:46 EDGDVIGPDD093 kernel: b1k update_request: I/0 error, dev sal, sector 2537346352 op 0x1: (WRITE) flags 0x4800 phys_seg 128 prio class O 15:19:47. EDGDVLGPDD093 abrt-hook-ccpp [26372231: Process 2636846 (postares)of user 20640 kIlled bY SIGABRI - dumping core
At the ESXi host, we can see ESXi failure that looks like this:
Errors on vmkernel: less <ESXi_hostnmae>-2023-01-19/tdlog/logs/vmkernel.all 2023-01-18T21:27:15.941Z cpu17:7887374)WARNING: SCSI: 167: Unable to allocate SCSI_Command (sgLen = 505, RA 0x4200138a735f) 2023-01-18T21:27:15.941Z cpu17:7887374)WARNING: SCSI: 232: Unable to allocate SCSI_Command (sgLen = 260, RA 0x42001394ba15) 2023-01-18T21:27:15.941Z cpu17:7887374)WARNING: VSCSI: 1230: Reallocate command, No mem, len=65 2023-01-18T21:27:15.941Z cpu17:7887374)VSCSI: 590: VSCSI_VmkAccumulateSG failed: Out of memory 2023-01-18T21:27:15.941Z cpu17:7887374)PVSCSI: 956: PROCESS: failed to pin: Out of memory 2023-01-18T21:27:15.941Z cpu17:7887374)WARNING: SCSI: 167: Unable to allocate SCSI_Command (sgLen = 505, RA 0x4200138a735f) 2023-01-18T21:27:15.941Z cpu17:7887374)WARNING: SCSI: 232: Unable to allocate SCSI_Command (sgLen = 260, RA 0x42001394ba15) 2023-01-18T21:27:15.941Z cpu17:7887374)WARNING: VSCSI: 1230: Reallocate command, No mem, len=65 2023-01-18T21:27:15.941Z cpu17:7887374)VSCSI: 590: VSCSI_VmkAccumulateSG failed: Out of memory 2023-01-18T21:27:15.941Z cpu17:7887374)PVSCSI: 956: PROCESS: failed to pin: Out of memory 2023-01-18T21:27:15.941Z cpu17:7887374)WARNING: SCSI: 167: Unable to allocate SCSI_Command (sgLen = 505, RA 0x4200138a735f) 2023-01-18T21:27:15.941Z cpu17:7887374)WARNING: SCSI: 232: Unable to allocate SCSI_Command (sgLen = 260, RA 0x42001394ba15) 2023-01-18T21:27:15.941Z cpu17:7887374)WARNING: VSCSI: 1230: Reallocate command, No mem, len=65 2023-01-18T21:27:15.941Z cpu17:7887374)VSCSI: 590: VSCSI_VmkAccumulateSG failed: Out of memory
When VMware Greenplum is deployed on top of the virtual machines, the entire virtualized stack needs to be considered to identify the issue.
In order to observe the error messages, it is recommended that `VM logging` is turned ON in the vSphere cluster. You can find more details from Enable Virtual Machine Logging (vmware.com)
First, we look at the available Greenplum segment logs. Refer to Monitoring Greenplum Database Log Files (vmware.com). We can clearly see we consistently hitting `fsync` issues due to `Input/output error`. This indicates something wrong when we cannot flush the data to the disk.
Second, we check the guest OS kernel log on the VM by looking at the `/var/log/messages` content. By correlating timestamps with the observed failures in Greenplum logs, we can observe IO error happened reported by the `kernel`, like `b1k update_request: I/0 error, dev sal, sector 2537346352 op 0x1: (WRITE) flags 0x4800 phys_seg 128 prio class O`.
Third, within the available vSphere and ESXi logs, or within a support bundle exported from vCenter, the timestamps of the failures can be correlated to observe issue at the vSphere level.
In this case, the `fsync` issue is caused by the hypervisor when they are handling the `VSCSI` driver, which is handling all the IO request from the virtual machines. Due to high resource (MEMORY) pressure generated by the Greenplum, the hypervisor cannot have enough memory to accomplish the IO request.
This issue has only been confirmed and observed in ESXi versions 7.0U3 to 7.0U3e. Also, the fsync issue is fixed in 7U3f, so 7U3e would still have the issue.
The issue has not been observed in other ESXi versions. Therefore the current resolution is to upgrade to at-least ESXi 7.0U3f.
There are many vSphere fixes in those newer versions improving the reliability of the IO processing under high IO pressure.
Check the interoperability matrix Product Interoperability Matrix (vmware.com) for the Greenplum and supported versions.