Rotate bosh-dns leaf certificates using maestro
search cancel

Rotate bosh-dns leaf certificates using maestro

book

Article ID: 335087

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated (TKGi) VMware Tanzu Kubernetes Grid Integrated Edition VMware Tanzu Kubernetes Grid Integrated Edition 1.x VMware Tanzu Kubernetes Grid Integrated Edition (Core)

Issue/Introduction

  • This procedure can be used to rotate the bosh-dns leaf certificates, which are all signed by the CA /opsmgr/bosh_dns/tls_ca
  • These certificates are non-configurable leaf certificates.
  • The procedure will also help verify the rotation of these certificates.
  • More details on the certificate types can be found here.
  • For rotating the CA /opsmgr/bosh_dns/tls_ca that is signing the Bosh dns leaf certificates, refer to the "Rotate a Single CA and Its Leaf Certificates" documentation

 

Environment

TKGI 1.9+

Cause

Bosh dns leaf certificates are not rotated in a timely manner.

Resolution

Prerequisite: If you have a TKGI cluster where the certificate rotation has failed (tkgi rotate-certificates), you should finish the certificate rotation on this specific TKGI cluster by re-running tkgi rotate-certificates command until it successfully completes. Then you can go ahead with the Bosh dns certificate rotation.

  • Setup credentials for credhub and maestro access (Replace sections in <red> with environment specific variables)

    export BOSH_CLIENT=ops_manager BOSH_CLIENT_SECRET=<secret>
    export BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate BOSH_ENVIRONMENT=<IP>
    export CREDHUB_SERVER="$BOSH_ENVIRONMENT:8844" CREDHUB_CLIENT="$BOSH_CLIENT"

    export CREDHUB_SECRET="$BOSH_CLIENT_SECRET" CREDHUB_CA_CERT="$BOSH_CA_CERT"
    credhub api https://$BOSH_ENVIRONMENT:8844 --ca-cert=/var/tempest/workspaces/default/root_ca_certificate
    credhub login

  • Use the below command to get a list of certificates signed by /opsmgr/bosh_dns/tls_ca

    maestro --json topology --name /opsmgr/bosh_dns/tls_ca | jq .topology[].signs[].name

    "/bosh_dns_health_client_tls"
    "/bosh_dns_health_server_tls"
    "/dns_api_client_tls"
    "/dns_api_server_tls"


  • Check the current validity of the certificates

    maestro --json topology --name /opsmgr/bosh_dns/tls_ca | jq '.topology[].signs[] | "\(.name) \(.versions[].valid_until)"'

    "/bosh_dns_health_client_tls 2021-12-18T02:30:06Z"
    "/bosh_dns_health_server_tls 2021-12-18T02:30:06Z"
    "/dns_api_client_tls 2021-12-18T02:30:07Z"
    "/dns_api_server_tls 2021-12-18T02:30:07Z"


  • How to check which deployments are using these certificates

    The bosh-dns certificates are used by almost every deployment. In the example below, there are only two deployments as this environment only has TKGI installed along with one K8s cluster.

    maestro --json topology --name /opsmgr/bosh_dns/tls_ca | jq '.topology[].signs[] | "\(.name) \(.versions[].deployment_names)"'

    "/bosh_dns_health_client_tls [\"pivotal-container-service-3b9cfff74271c08d9e0d\",\"service-instance_16d9b7c4-c1fe-4be9-81c3-0aedda0ea6c0\"]"
    "/bosh_dns_health_server_tls [\"pivotal-container-service-3b9cfff74271c08d9e0d\",\"service-instance_16d9b7c4-c1fe-4be9-81c3-0aedda0ea6c0\"]"
    "/dns_api_client_tls [\"pivotal-container-service-3b9cfff74271c08d9e0d\",\"service-instance_16d9b7c4-c1fe-4be9-81c3-0aedda0ea6c0\"]"
    "/dns_api_server_tls [\"pivotal-container-service-3b9cfff74271c08d9e0d\",\"service-instance_16d9b7c4-c1fe-4be9-81c3-0aedda0ea6c0\"]"


  • Certificate Rotation Procedure

    The certificates can be rotated using the maestro utility. Since all these certificates are signed by the same CA, we will leverage this fact to regenerate them all at once. The certificates that are to be regenerated can be confirmed via a dry-run first. In the output below we see only four certificates being regenerated

    maestro regenerate leaf --signed-by /opsmgr/bosh_dns/tls_ca --dry-run

    to_be_regenerated:
    - name: /bosh_dns_health_client_tls
    certificate_id: 0b402fa6-04a3-492f-849d-cde95f5cff88
    - name: /bosh_dns_health_server_tls
    certificate_id: b6e7b8f2-edb4-4c2e-a1a5-332cd9f28c37
    - name: /dns_api_client_tls
    certificate_id: 0ae32163-a76a-4d14-8fa8-79e5402b9511
    - name: /dns_api_server_tls
    certificate_id: 5ac68fdb-01ef-4d50-a9f3-92281e57a74c


  • Regenerating Certificates

    Following command can be used to regenerate the certificate. Upon successful regeneration you'll see a similar output

    maestro regenerate leaf --signed-by /opsmgr/bosh_dns/tls_ca

    regenerated:
    - name: /bosh_dns_health_client_tls
    certificate_id: 0b402fa6-04a3-492f-849d-cde95f5cff88
    - name: /bosh_dns_health_server_tls
    certificate_id: b6e7b8f2-edb4-4c2e-a1a5-332cd9f28c37
    - name: /dns_api_client_tls
    certificate_id: 0ae32163-a76a-4d14-8fa8-79e5402b9511
    - name: /dns_api_server_tls
    certificate_id: 5ac68fdb-01ef-4d50-a9f3-92281e57a74c

    After regeneration, querying these certificates using maestro, shows two copies([OLD] and [NEW]) of each certificate. This also confirms that a new certificate was added to credhub. The old and new certificates can be identified using the deployment_names and timestamp as shown in the output below


    [OLD]

    maestro --json topology --name /opsmgr/bosh_dns/tls_ca | jq '.topology[].signs[] | "\(.name) \(.versions[1].deployment_names) \(.versions[1].valid_until)"'

    "/bosh_dns_health_client_tls [\"pivotal-container-service-3b9cfff74271c08d9e0d\",\"service-instance_16d9b7c4-c1fe-4be9-81c3-0aedda0ea6c0\"] 2021-12-18T02:30:06Z"
    "/bosh_dns_health_server_tls [\"pivotal-container-service-3b9cfff74271c08d9e0d\",\"service-instance_16d9b7c4-c1fe-4be9-81c3-0aedda0ea6c0\"] 2021-12-18T02:30:06Z"
    "/dns_api_client_tls [\"pivotal-container-service-3b9cfff74271c08d9e0d\",\"service-instance_16d9b7c4-c1fe-4be9-81c3-0aedda0ea6c0\"] 2021-12-18T02:30:07Z"
    "/dns_api_server_tls [\"pivotal-container-service-3b9cfff74271c08d9e0d\",\"service-instance_16d9b7c4-c1fe-4be9-81c3-0aedda0ea6c0\"] 2021-12-18T02:30:07Z"

    [NEW]

    maestro --json topology --name /opsmgr/bosh_dns/tls_ca | jq '.topology[].signs[] | "\(.name) \(.versions[0].deployment_names) \(.versions[0].valid_until)"'

    "/bosh_dns_health_client_tls [] 2021-12-18T06:18:35Z"
    "/bosh_dns_health_server_tls [] 2021-12-18T06:18:35Z"
    "/dns_api_client_tls [] 2021-12-18T06:18:35Z"
    "/dns_api_server_tls [] 2021-12-18T06:18:35Z"


    Although these certificates have been regenerated and added to credhub, the new certificates are not deployed to the actual VMs yet. This is evident from the blank([ ]) deployment name assigned to the new certificates.


  • Deploying the certificates

    Note:  Before proceeding with the steps below; it is important to understand that during the cluster upgrades, services on VMs are restarted to complete the certificate deployment. The nodes will go through a drain operation and you will notice your workloads being drained and scheduled to other nodes.


    • After the certificates have been regenerated, Apply Changes is needed to push these certificates to the VMs from the Opsmgr UI
    • Select Review Pending Changes
    • Select TKGI tile
    • Select upgrade all clusters errand
      • If you can't rotate these certificates on all the clusters and prefer to deploy one cluster at a time tkgi upgrade-cluster <cluster-name> can be used via cli to achieve this.
    • Wait for above operation(s) to complete


    • If cluster upgrade fails to update worker nodes

      Note:
      Under certain conditions, the worker nodes within a cluster might enter failed state and might prevent cluster upgrades from proceeding. In these scenarios, use the following process to progress cluster upgrade. Also, Note: The steps below will cause some downtime on non-High Availability (HA) single node service-instances

      • Download the manifest for the on-demand service instance:

        bosh -d <service-instance-deployment> manifest > <service-instance-deployment>.yaml

      • Run a deploy operation to apply the manifest gathered in the previous step:

        bosh -d <service-instance-deployment> deploy <service-instance-deployment>.yaml --skip-drain --fix

      • If the recreated manifest deployment STILL fails, use the following command to unmonitor monit services on all worker nodes:

        bosh -d <service-instance-deployment> ssh worker -c 'sudo -i && monit unmonitor all'

      • Then run a deploy operation again:

        bosh -d <service-instance-deployment> deploy <service-instance-deployment>.yaml --skip-drain --fix

         
  • How to verify certificate rotation

    After finishing the previous step, running the command below, we can see that the new certificates are associated with the deployments now

    [OLD]

    maestro --json topology --name /opsmgr/bosh_dns/tls_ca | jq '.topology[].signs[] | "\(.name) \(.versions[1].deployment_names) \(.versions[1].valid_until)"'

    "/bosh_dns_health_client_tls [] 2021-12-18T02:30:06Z"
    "/bosh_dns_health_server_tls [] 2021-12-18T02:30:06Z"
    "/dns_api_client_tls [] 2021-12-18T02:30:07Z"
    "/dns_api_server_tls [] 2021-12-18T02:30:07Z"


    [NEW]

    maestro --json topology --name /opsmgr/bosh_dns/tls_ca | jq '.topology[].signs[] | "\(.name) \(.versions[0].deployment_names) \(.versions[0].valid_until)"'

    "/bosh_dns_health_client_tls [\"pivotal-container-service-3b9cfff74271c08d9e0d\",\"service-instance_16d9b7c4-c1fe-4be9-81c3-0aedda0ea6c0\"] 2021-12-18T06:18:35Z"
    "/bosh_dns_health_server_tls [\"pivotal-container-service-3b9cfff74271c08d9e0d\",\"service-instance_16d9b7c4-c1fe-4be9-81c3-0aedda0ea6c0\"] 2021-12-18T06:18:35Z"
    "/dns_api_client_tls [\"pivotal-container-service-3b9cfff74271c08d9e0d\",\"service-instance_16d9b7c4-c1fe-4be9-81c3-0aedda0ea6c0\"] 2021-12-18T06:18:35Z"
    "/dns_api_server_tls [\"pivotal-container-service-3b9cfff74271c08d9e0d\",\"service-instance_16d9b7c4-c1fe-4be9-81c3-0aedda0ea6c0\"] 2021-12-18T06:18:35Z"


  • Cleaning up inactive certificates

    After successful regeneration, the older inactive certificates can now be garbage collected. This can be done using the commands below.

    Note: The below command also has a --name option which can be used to garbage collect a single certificate.

    maestro gc leaf --all

    deleted:
    - name: /dns_api_server_tls
    certificate_id: 5ac68fdb-01ef-4d50-a9f3-92281e57a74c
    version_ids:
    - 377142ec-578d-4df7-8abe-1db375cd380a
    - name: /bosh_dns_health_server_tls
    certificate_id: b6e7b8f2-edb4-4c2e-a1a5-332cd9f28c37
    version_ids:
    - 529bad95-1fd0-4469-9fb5-990b40453a13
    - name: /bosh_dns_health_client_tls
    certificate_id: 0b402fa6-04a3-492f-849d-cde95f5cff88
    version_ids:
    - 0c93c743-0dd2-477a-8771-9773d8065198
    - name: /dns_api_client_tls
    certificate_id: 0ae32163-a76a-4d14-8fa8-79e5402b9511
    version_ids:
    - 8c995186-2f79-4c0a-ba40-02dc2805b32f


  • Confirm that the inactive certificates are deleted

    maestro --json topology --name /opsmgr/bosh_dns/tls_ca | jq '.topology[].signs[] | "\(.name) \(.versions[0].deployment_names) \(.versions[0].valid_until)"'

    "/bosh_dns_health_client_tls [\"pivotal-container-service-3b9cfff74271c08d9e0d\",\"service-instance_16d9b7c4-c1fe-4be9-81c3-0aedda0ea6c0\"] 2021-12-18T06:18:35Z"
    "/bosh_dns_health_server_tls [\"pivotal-container-service-3b9cfff74271c08d9e0d\",\"service-instance_16d9b7c4-c1fe-4be9-81c3-0aedda0ea6c0\"] 2021-12-18T06:18:35Z"
    "/dns_api_client_tls [\"pivotal-container-service-3b9cfff74271c08d9e0d\",\"service-instance_16d9b7c4-c1fe-4be9-81c3-0aedda0ea6c0\"] 2021-12-18T06:18:35Z"
    "/dns_api_server_tls [\"pivotal-container-service-3b9cfff74271c08d9e0d\",\"service-instance_16d9b7c4-c1fe-4be9-81c3-0aedda0ea6c0\"] 2021-12-18T06:18:35Z"