To provide the data collection steps required to investigate the root cause of an NE VM kernel crash
An HCX Network Extension (NE) appliance VM may experienced a kernel crash where the NE VM remains offline until it it is manually recovered in vCenter by a power off/on or reset
In the vCenter UI there is a message that the NE VM was disabled and requires a power off/on or reset
On the source host where the NE VM resides, the hostd.log shows the same message for the NE VM:"The CPU has been disabled by the guest operating system. Power off or reset the virtual machine.
Log location: On ESXi host NE VM resides -- /var/run/log/hostd.log
2023-08-18T02:36:50.226Z verbose hostd[2099988] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/########-########-####-e4434b77f338/###-ServiceMesh-NE-I2-Nqs-Redeploying/###-ServiceMesh-NE-I2-Nqs-Redeploying.vmx opID=lro-#########-########-01-01-3e-a860] Handling vmx message 9423161: The CPU has been disabled by the guest operating system. Power off or reset the virtual machine. 2023-08-18T02:36:50.226Z warning hostd[2099988] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/########-########-####-e4434b77f338/XXX-ServiceMesh-NE-I2-Nqs-Redeploying/XXX-ServiceMesh-NE-I2-Nqs-Redeploying.vmx opID=lro-#########-########-01-01-3e-a860] Failed to find activation record, event user unknown. 2023-08-18T02:36:50.227Z info hostd[2099988] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=lro-#########-########-01-01-3e-a860] Event 3429 : Message on XXX-ServiceMesh-NE-I2 on sv136284.XXX.com in ha-datacenter: The CPU has been disabled by the guest operating system. Power off or reset the virtual machine.
On the source host where the NE VM resides, the vmware.log associated with the NE VM shows a vmkernel panic / crash relating to 'skbuff: skb_under_panic' and 'kernel BUG at net/core/skbuff.c:104!'
Log location: On ESXi host NE VM resides -- /vmfs/volumes/<Datastore_name>/###-ServiceMesh-NE-I2-Nqs-Redeploying/vmware.log
2023-08-18T02:36:50.218Z In(05) vcpu-4 - Guest: <0>[1453356.452382] skbuff: skb_under_panic: text:00000000717a3dbe len:1434 put:8 head:00000000536df6e8 data:0000000080e65197 tail:0x594 end:0x6c0 dev:ipip_te_0 2023-08-18T02:36:50.218Z In(05) vcpu-4 - Guest: <4>[1453356.452593] ------------[] cut here ]------------ 2023-08-18T02:36:50.218Z In(05) vcpu-4 - Guest: <2>[1453356.452594] kernel BUG at net/core/skbuff.c:104! 2023-08-18T02:36:50.218Z In(05) vcpu-4 - Guest: <4>[1453356.452688] invalid opcode: 0000 [#1] SMP NOPTI 2023-08-18T02:36:50.218Z In(05) vcpu-4 - Guest: <4>[1453356.452748] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G OE 4.19.245-1.ph3-esx #1-photon 2023-08-18T02:36:50.218Z In(05) vcpu-4 - Guest: <4>[1453356.452816] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 2023-08-18T02:36:50.218Z In(05) vcpu-4 - Guest: <4>[1453356.452901] RIP: 0010:skb_panic+0x4a/0x50 2023-08-18T02:36:50.218Z In(05) vcpu-4 - Guest: <4>[1453356.452938] Code: 00 00 50 8b 87 d0 00 00 00 50 8b 87 cc 00 00 00 50 ff b7 e0 00 00 00 4c 8b 8f d8 00 00 00 48 c7 c7 e8 e4 9a a3 e8 72 4b 11 00 <0f> 0b 0f 1f 40 00 48 8b 97 e0 00 00 00 89 f0 01 b7 80 00 00 00 48 2023-08-18T02:36:50.218Z In(05) vcpu-4 - Guest: <4>[1453356.453068] RSP: 0000:ffffbe9f0017c410 EFLAGS: 00010282 2023-08-18T02:36:50.219Z In(05) vcpu-4 - Guest: <4>[1453356.453112] RAX: 000000000000008c RBX: ffff9e5629003300 RCX: 0000000000000000 2023-08-18T02:36:50.219Z In(05) vcpu-4 - Guest: <4>[1453356.453167] RDX: ffff9e567cb21c60 RSI: ffff9e567cb1b088 RDI: ffff9e567cb1b088 2023-08-18T02:36:50.219Z In(05) vcpu-4 - Guest: <4>[1453356.453222] RBP: ffffbe9f0017c430 R08: 0000000000000000 R09: 000000000000059d 2023-08-18T02:36:50.219Z In(05) vcpu-4 - Guest: <4>[1453356.453280] R10: 000000000f4a0b8c R11: 642030633678303a R12: ffffbe9f0017c4fc 2023-08-18T02:36:50.219Z In(05) vcpu-4 - Guest: <4>[1453356.453337] R13: ffff9e55eb14181c R14: ffffbe9f0017c500 R15: 0000000000009411 2023-08-18T02:36:50.219Z In(05) vcpu-4 - Guest: <4>[1453356.453414] FS: 0000000000000000(0000) GS:ffff9e567cb00000(0000) knlGS:0000000000000000 2023-08-18T02:36:50.219Z In(05) vcpu-4 - Guest: <4>[1453356.453491] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2023-08-18T02:36:50.219Z In(05) vcpu-4 - Guest: <4>[1453356.453541] CR2: 0000000000000000 CR3: 0000000097a0a001 CR4: 00000000001606a0 2023-08-18T02:36:50.219Z In(05) vcpu-4 - Guest: <4>[1453356.453619] Call Trace: 2023-08-18T02:36:50.219Z In(05) vcpu-4 - Guest: <4>[1453356.453649] <IRQ> </IRQ> 2023-08-18T02:36:50.223Z In(05) vcpu-4 - Guest: <4>[1453356.456140] RIP: 0010:native_safe_halt+0x17/0x20 2023-08-18T02:36:50.223Z In(05) vcpu-4 - Guest: <4>[1453356.456183] Code: 48 8b 00 a8 08 0f 84 76 ff ff ff eb bd 90 90 90 90 90 90 8b 05 3a 08 57 00 55 48 89 e5 85 c0 7e 07 0f 00 2d db a5 1c 00 fb f4 <5d> c3 0f 1f 80 00 00 00 00 8b 05 1a 08 57 00 55 48 89 e5 85 c0 7e 2023-08-18T02:36:50.223Z In(05) vcpu-4 - Guest: <4>[1453356.456323] RSP: 0000:ffffbe9f000a7ea0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04 2023-08-18T02:36:50.223Z In(05) vcpu-4 - Guest: <4>[1453356.456388] RAX: 0000000000000000 RBX: 0000000000000004 RCX: ffff9e567cb1f100 2023-08-18T02:36:50.223Z In(05) vcpu-4 - Guest: <4>[1453356.456450] RDX: ffffffffa3a2eff8 RSI: ffff9e567cb1f100 RDI: 000529d1f020e627 2023-08-18T02:36:50.223Z In(05) vcpu-4 - Guest: <4>[1453356.456511] RBP: ffffbe9f000a7ea0 R08: 0000000000000000 R09: ffff9e567cb24200 2023-08-18T02:36:50.223Z In(05) vcpu-4 - Guest: <4>[1453356.456574] R10: ffffbe9f000a7e88 R11: 0000000000000000 R12: ffffffffa3a877c0 2023-08-18T02:36:50.224Z In(05) vcpu-4 - Guest: <4>[1453356.456636] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 2023-08-18T02:36:50.224Z In(05) vcpu-4 - Guest: <4>[1453356.456699] default_idle+0x10/0x20 2023-08-18T02:36:50.224Z In(05) vcpu-4 - Guest: <4>[1453356.456738] arch_cpu_idle+0x10/0x20 2023-08-18T02:36:50.224Z In(05) vcpu-4 - Guest: <4>[1453356.456774] default_idle_call+0x1e/0x30 2023-08-18T02:36:50.224Z In(05) vcpu-4 - Guest: <4>[1453356.456814] do_idle+0x1c9/0x1f0 2023-08-18T02:36:50.224Z In(05) vcpu-4 - Guest: <4>[1453356.456849] cpu_startup_entry+0x5f/0x70 2023-08-18T02:36:50.224Z In(05) vcpu-4 - Guest: <4>[1453356.456888] start_secondary+0x19d/0x1e0 2023-08-18T02:36:50.224Z In(05) vcpu-4 - Guest: <4>[1453356.456927] secondary_startup_64_no_verify+0xca/0xcb 2023-08-18T02:36:50.224Z In(05) vcpu-4 - Guest: <4>[1453356.456972] Modules linked in: drbg ansi_cprng seqiv esp4(E) xfrm6_mode_tunnel(E) xfrm4_mode_tunnel(E) xt_u32(E) xt_nat(E) xt_cpu(E) xt_multiport(E) xt_connmark(E)xt_mark(E) ebt_arp(E) ebt_dnat(E) ebtable_nat(E) ebtable_filter(E) ebtables(E) nf_log_ipv4(E) nf_log_common(E) xt_limit(E) iptable_raw(E) arptable_filter(E) ip6table_mangle(E) ip6table_nat(E) iptable_mangle(E) iptable_nat(E) nf_conntrack_netlink(E) nfnetlink(E) xt_LOG(E) dummy(E) openvswitch(E) nsh(E) nf_nat_ipv6(E) nf_nat_ipv4(E) nf_conncount(E) nf_nat(E) xt_policy(E) xt_state(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) mousedev(E) psmouse(E) evdev(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) br_netfilter(E) ip_gre(E) fou(E) ip6_udp_tunnel(E) udp_tunnel(E) vxlan_trunk(E) bridge(E) stp(E) arp_tables(E) 2023-08-18T02:36:50.224Z In(05) vcpu-4 - Guest: <4>[1453356.457529] llc(E) ipip(E) tunnel4(E) ip_tunnel(E) rdrand_rng(E) rng_core(E) aesni_intel aes_x86_64 crypto_simd cryptd glue_helper sr_mod(E) cdrom(E) floppy(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) ipv6(E) 2023-08-18T02:36:50.224Z In(05) vcpu-4 - Guest: <4>[1453356.457715] ---[] end trace 0119df309df630a4 ]--- 2023-08-18T02:36:50.225Z In(05) vcpu-0 - Vix: [vmxCommands.c:7182]: VMAutomation_HandleCLIHLTEvent. Do nothing. 2023-08-18T02:36:50.225Z In(05) vcpu-0 - MsgHint: msg.monitorevent.halt 2023-08-18T02:36:50.225Z In(05)+ vcpu-0 - The CPU has been disabled by the guest operating system. Power off or reset the virtual machine. 2023-08-18T02:36:50.225Z In(05)+ vcpu-0 - --------------------------------------- 2023-08-18T02:36:50.227Z In(05) vcpu-0 - VigorTransportProcessClientPayload: opID=lro-#########-########-01-01-3e-a860 seq=1087320: Receiving Bootstrap.MessageReply request. 2023-08-18T02:36:50.227Z In(05) vcpu-0 - VigorTransport_ServerSendResponse opID=lro-#########-########-01-01-3e-a860 seq=1087320: Completed Bootstrap request. 2023-08-18T02:36:50.227Z In(05) vcpu-4 - Guest: <5>[ 0.000000] Linux version 4.19.245-1.ph3-esx (root@photon) (gcc version 7.3.0 (GCC)) #1-photon SMP Thu Nov 10 19:21:49 UTC 2022 2023-08-18T02:36:50.227Z In(05) vcpu-4 - Guest: <6>[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.19.245-1.ph3-esx root=/dev/sda2 init=/lib/systemd/systemd ro loglevel=3 quiet no-vmw-sta loadpin.enabled=0 slub_debug=- page_poison=off slab_nomerge cgroup.memory=nokmem pti=off l1tf=off mds=off isolcpus=1,2,3,4,5,6,7 net.ifnames=0 plymouth.enable=0 systemd.legacy_systemd_cgroup_controller=yes fips=0 audit=0 2023-08-18T02:36:50.227Z In(05) vcpu-4 - Guest: <6>[ 0.000000] Disabled fast string operations
VMware HCX
In order to determine the root cash of the NE VM vmkernel panic / crash, the NE VM will need to be suspended in vCenter while it is in the problem state so that a memory dump can be collected for analysis by the engineering team
NOTE: The ability to suspend an NE VM in vCenter is disabled by default. The configuration change procedure within vCenter that allows the NE VM to be suspended requires a power off/on of the NE VM in which case the current memory state will be lost. These means that once the procedure to allow NE VM suspension is done, you will have to experience a new occurrence of the vmkernel panic / crash on that NE VM in order to suspend it and collect the memory dump file.
If you believe you have experienced this issue, please provide the below information and reference this KB article when opening a support case with Broadcom.
The NE VM remains offline until it it is manually recovered in vCenter by a power off/on or reset