The challenge is in pending status with an error "error instantiating route53 challenge solver: unable to assume role"
search cancel

The challenge is in pending status with an error "error instantiating route53 challenge solver: unable to assume role"

book

Article ID: 297923

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

- The certificate/tap-default-tls was created but it keeps in READY=False status.
$ kubectl get certificates -n tanzu-system-ingress
NAME              READY   SECRET                AGE
tap-default-tls   False   tap-default-tls       2d15h

- By further checking on the related challenge, it's showing as a pending status.
$ kubectl get challenges -n tanzu-system-ingress
NAMESPACE              NAME                  STATE     DOMAIN    AGE
tanzu-system-ingress   tap-default-tls-xxx   pending   abc.com   2d15h

- The reason of the failure is "error instantiating route53 challenge solver: unable to assume role".
apiVersion: v1
items:
- apiVersion: acme.cert-manager.io/v1
  kind: Challenge
...
status:
  presented: false
  processing: true
  reason: "error instantiating route53 challenge solver: unable to assume role:
    AccessDenied: User: arn:aws:sts::123456:assumed-role/<ROLE-NAME>/xxx
    is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::123456:role/<ROLE-NAME>\n\tstatus
    code: 403, request id: xyz123"
  state: pending


Environment

Product Version: 1.6

Resolution

When referencing Creating-an-issuer-or-clusterissuer - Route53, there is a example configuration for a ClusterIssuer. And in the example, seems it's necessary to associate spec.acme.solvers.dns01.route53.role.
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    ...
    solvers:

    # example: cross-account zone management for example.com
    # this solver uses ambient credentials (i.e. inferred from the environment or EC2 Metadata Service)
    # to assume a role in a different account
    - selector:
        dnsZones:
          - "example.com"
      dns01:
        route53:
          region: us-east-1
          hostedZoneID: DIKER8JEXAMPLE # optional, see policy above
          role: arn:aws:iam::YYYYYYYYYYYY:role/dns-manager
However,  spec.acme.solvers.dns01.route53.role should be excluded if the IAM role is already attached to cert-manager controller by IRSA. For instance, you already referenced eks-iam-role-for-service-accounts-irsa and annotated the ServiceAccount created by cert-manager.
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::XXXXXXXXXXX:role/cert-manager
spec:
  template:
    spec:
      securityContext:
        fsGroup: 1001

So to fix the pending challenge issue, you need to remove spec.acme.solvers.dns01.route53.role​​​​​​​ section from the ClusterIssuer configuration by the command kubectl edit clusterissuer/<clusterissuer-name>.

Then it's necessary to manually delete the failed resources including:
  • certificate/tap-default-tls
  • secret/tap-default-tls-SOME-PREFIX
  • order
  • challenge
Deletion of the pending challenge might not succeed as the .metadata.finalizers section of the challenge needs to be an empty value so that it can be deleted. You can either use command kubectl edit challenge/<challenge-name> to manually set .metadata.finalizers to an empty value or simply use the command kubectl patch challenge/<challenge-name> -p '{"metadata":{"finalizers":[]}}' --type=merge.

Post the deletion of above resources, a new set of resources will be created and the new certificate/tap-default-tls should be ready for use.