Tuning NCP parameters for increased PKS Scaling in VMware Enterprise PKS environment
search cancel

Tuning NCP parameters for increased PKS Scaling in VMware Enterprise PKS environment

book

Article ID: 316827

calendar_today

Updated On:

Products

VMware Cloud PKS

Issue/Introduction


This article helps you to tune the limits of existing and new clusters when there are scale related issues that manifest into NSX-T Manager being overwhelmed with the number of active NCP instances which is directly proportional to the number of K8s clusters being managed by a single NSX-T instance. The purpose of this article is to document how to use this new capability to update the NCP parameters but not to suggest a value or a range. Any values that need to be set need to be based on engineering approval (NCP and NSX-T team) and would vary by customer’s scale and environment.

Starting from VMware Enterprise PKS 1.4.1, you can manually tune NCP heart beat and rate limits parameters to help scale your PKS deployments. Note that these changes are applicable to NCP 2.4.x only. The following parameters that are part of the
NCP.ini file are tunable:
  • master_timeout
  • heartbeat_period
  • update_timeout
In NCP 2.4.1, the recommended values for above parameters are:
  • master_timeout = 18
  • heartbeat_period = 6
  • update_timeout = master_timeout – heartbeart_period = 12

The description for each of these parameters is available under Configmap for ncp.ini in ncp-rc.yml section in the NSX Container Plug-in for Kubernetes and Cloud Foundry Documentation.This documentation lists all parameters in the ncp.ini file, however only the above 3 are configurable with PKS v1.4.1.

Environment

VMware PKS 1.x

Resolution

To perform the updates, we use the Ops Manager CLI. For more information, see updating a simple property section under Opsmanager api documentation .
 
Instructions to update these parameters in the PKS Tile:

  1. Get the PKS product GUID using the Ops Manager CLI:
    # om -k -t <opsman_IP> -u <opsman_username> -p "<opsman_password>" curl -s -p '/api/v0/staged/products' | jq -cr '.[] | select( .type=="pivotal-container-service" ) | .guid'
    Sample Output: pivotal-container-service-0d2397b6f4b0ca35be13

    Export it as PKS_GUID:
    export PKS_GUID=pivotal-container-service-0d2397b6f4b0ca35be13

    Or

    export PKS_GUID=$(om -k -t <opsman_IP> -u <opsman_username> -p "<opsman_password>" curl -s -p '/api/v0/staged/products' | jq -cr '.[] | select( .type=="pivotal-container-service" ) | .guid')

  2. Update the property values for below parameters:
    ".properties.network_selector.nsx.ncp-ha-master-timeout"
    ".properties.network_selector.nsx.ncp-ha-heartbeat-period"
    ".properties.network_selector.nsx.ncp-ha-update-timeout"


    For example, the following commands will update them to 18, 6, 12.

    om -k -t <opsman_IP> -u <opsman_username> -p '<opsman_password>' curl -x PUT -p "/api/v0/staged/products/${PKS_GUID}/properties" -d '{ "properties": { ".properties.network_selector.nsx.ncp-ha-master-timeout": { "value": 18}, ".properties.network_selector.nsx.ncp-ha-heartbeat-period": { "value": 6}, ".properties.network_selector.nsx.ncp-ha-update-timeout": { "value": 12} }}'

  3. After this change, you need to Apply-Changes on the PKS tile to make it take effect:

    • New clusters will be deployed with the new values.

    • If upgrade-all-instances errand is enabled, all existing clusters will be updated to new values during apply-changes.