Following an upgrade to NSX-T 3.2.0 or later, IKED service crashes seen with VPN tunnels going down
search cancel

Following an upgrade to NSX-T 3.2.0 or later, IKED service crashes seen with VPN tunnels going down

book

Article ID: 368833

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • You are running NSX-T 3.2.0 or later.
  • Multiple IKED cores are seen post upgrade and does not recover.
  • VPN flows get disrupted until the workaround is applied manually.
  • Bypass Policy is configured in IPSEC service with either "Local Networks" or "Remote Networks" as empty.
  • Running get ipsecvpn session summary on the Edge Node results in an output similar to the below:

    Thu May 30 2024 EDT 04:25:27.853
    Version  SID  Compliance Suite Type    Auth  Status        Local IP         Peer IP          Down Reason
    ----------------------------------------------------------------------------------------------------------------------------
    IKEv2    0    NONE             Policy  PSK   Down          <Local IP>       <Peer IP>        IKED waiting for RSS queue info from DP
    IKEv1    0    NONE             Policy  PSK   Down          <Local IP>       <Peer IP>        IKED waiting for RSS queue info from DP
    IKEv2    0    NONE             Policy  PSK   Down          <Local IP>       <Peer IP>        IKED waiting for RSS queue info from DP
    IKEv2    0    NONE             Policy  PSK   Down          <Local IP>       <Peer IP>        IKED waiting for RSS queue info from DP
    ----------------------------------------------------------------------------------------------------------------------------
  • IKED core dumps will be visible in /var/dump:

    -rw-r--r--  1 root root 2.0M May 30 01:03 core.iked.1714527648.18883.150.11.gz

  • Entries similar to the below will be visible in /var/log/kern.log:
    2024-05-30T05:03:47.852Z edge01 kernel - - - [ 6277.247167] grsec: Segmentation fault occurred at 0000000000000038 in /opt/vmware/nsx-edge/bin/iked[iked:28091] uid/euid:150/150 gid/egid:150/150, parent /opt/vmware/edge/ike/entrypoint.sh[entrypoint.sh:27973] uid/euid:150/150 gid/egid:150/150

Environment

VMware NSX-T Data Center
VMware NSX

Cause

Prior to NSX-T 3.2.0, any empty field in the Bypass Policy would get mapped to IP address as 0.0.0.0.

In NSX-T 3.2.0 and later, the mapping is missing, due to which the nestdb attributes for local/remote appear with NULL values, this leads to crashing while processing occurs in IKED, as the value is expected but is not present.

Resolution

This issue is resolved in VMware NSX 4.2.0

Additional Information

Workaround

Before Upgrade: If the Bypass Policy is configured without local/remote networks (empty), update the configuration with the wildcard IP address of 0.0.0.0/0.

or

After Upgrade: If NSX is already upgraded and this issue is encountered, change the values to 0.0.0.0/0 and start the IKED service - at edge root shell, execute the following command:

#docker start service_iked.