Failed Actions: Unable to start or stop instances due to AWS service error
search cancel

Failed Actions: Unable to start or stop instances due to AWS service error

book

Article ID: 283643

calendar_today

Updated On:

Products

CloudHealth

Issue/Introduction

Policy Action to start or stop an AWS instance may fail due to "AWS Service error".  This is expected behavior when the API call made to AWS returns an error.  Additional instances in the same batch will also fail to start or stop.  This is also expected upon AWS failure response.

 

Details:

  • A Policy may be configured with an Action to issue a start or stop command to AWS instances when the policy conditions are met. 
  • An AWS start or stop Action works by grouping together any assets meeting the policy condition.  (This grouping is by AWS Account and Region.)  This group is then used as a parameter to an API batch call to AWS to perform the requested Action.  A group may contain one or more instances.
  • AWS receives the batch API call and if any of the instances in the call cannot be processed, the call returns an error.  (* Note that batch calls are not used for Azure or GCP so this article does not apply to those clouds)
  • All instances in the failed batch will not have the requested action executed not just the instance(s) that caused the failure.  This is known an expected behavior from the AWS API.  
  • We do not currently attempt to reprocess the call with a subset of instances however this is being considered as a future improvement.  Watch the latest Product Updates for any changes.  
  • These errors are reported in the Notifications=>Actions tab in the "Failed Actions" column as  "Unable to start/stop instances"  <list of all instances not stopped in the batch> "due to AWS Service error:" followed by the <instance information> that caused the failure.  
  • Example error responses may include "The Spot instance is not in a state from which it can be stopped", "You can only stop Spot instances associated with a persistent Spot instance requests"  You can't stop the Spot instance because it is associated with a one-time Spot instance request", "The instance is not in a state from which it can be stopped", and others.


Solutions and Best Practices:

  • Configure policy actions to include Verification and Notifications to alert you on failures.
  • Review the error message and consult AWS documentation to understand the reason.
  • Modify the policy conditions or exclude the instances that cannot correctly be acted on.