Some jobs fail when running large number of jobs against a salt master.
search cancel

Some jobs fail when running large number of jobs against a salt master.

book

Article ID: 384282

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

If a large number of jobs are run on a salt-master the following errors can be seen and some jobs fail to complete:

  • "ERROR executing 'state.apply': File client timed out 
  • "ERROR executing 'state.apply': Nonce verification error"

Environment

salt-master 3006.5

Resolution

  • Run the jobs in smaller batches so not to overwhelm the system.
  • Increase worker threads (worker_threads) to 24 and possibly up it to 1.5 number of cpus and monitor how much memory is consumed.
  • Increase the queue (event_return_queue) to to at least 1x possibly 2-3x number of minions.
  • Increasing the auth timeout on the minion side (auth_timeout to 240).