This guide provides details on tweaking the SaltStack Enterprise Configuration for performance considerations.
max_processes: 8 num_processes: 0 background_workers: combined_process: true concurrency: 0
By default SaltStack Enterprise (SSE) auto-calculates based on CPU cores the number of web workers and background workers up to a maximum of 8. If you wish to go above that limit, you must configure the max_processes
limit or configure the processes for each individually. num_processes
configures the number of web workers that will be started and background_workers:concurrency
configures how many background workers get started. When the load of web requests is high, you may need more web workers. If you run a lot of schedules, you may need more background workers. To determine if you need more workers of any kind, you can monitor CPU usage and see how busy each set of processes are. Web workers are labeled with [Webserver]
and background workers are labeled with [celeryd]
.
SSE typically starts up both of these process types automatically. To split out the two processes, configure background_workers:combined_process
to false
. When this is done, the raas
command will only start web workers, and the raas worker
command will only start background workers.
webserver_max_memory: 0 webserver_max_time: 0 webserver_body_timeout: webserver_max_body_size: webserver_max_buffer_size: background_workers: max_tasks: 100000 max_memory: 0
Use webserver_max_memory
and background_workers:max_memory
to limit memory usage. This is used to mitigate memory leaks sometime present in library dependencies in unpatched systems. If the limit is reached, the worker will be restarted. background_workers:max_tasks
has a similar effect in restarting background workers after they have completed a certain number of jobs.
webserver_max_time
, webserver_body_timeout
, webserver_max_body_size
, and webserver_max_buffer_size
all limit the time or size that is allocated to one web request. If the limit is passed, then the request is dropped. This is useful if you experience a denial of service attack.
background_workers: prefetch_multiplier: 4 without_heartbeat: false without_mingle: false without_gossip: false
By default, background workers coordinate queue activities to better load balance tasks between them. However, this causes more noise on the "queue" (Redis). If you are processing lots of background work or have a Redis instance that can not be upgraded, you can turn off without_heartbeat
, without_mingle
, and without_gossip
. This turns off worker coordination. The trade-off is that tasks will not load balance as well, but you get the advantage of less queue activity, which means jobs often start faster.
Additionally, you can set prefetch_multiplier
higher. Each worker will fetch multiple tasks at once to reduce queue usage. When you set prefetch_multiplier higher, the worker will acquire more tasks at once. This again reduces the load balancing of tasks, but also in turn reduces queue usage.
By tuning how often caching and clean up is done, you can increase or decrease performance of the system as a whole.
cache_cycle: 30
During a cache cycle, tasks like aggregating return data are performed. By increasing the cache cycle time, you can reduce duplicate work and decrease the load on the system. The drawback is that information is usually not presented in the UI as quickly.
Conversely, by decreasing this time, you will see results in the UI faster, but put a greater load on your background workers.
clean_up_cycle: 900 job_unresponsive_check: 5 job_unresponsive_check_stop: 2880 master_unresponsive_check_limit: 2
The clean up cycle performs task like checking for stuck jobs and cleaning them up or checking in with the master to make sure the job is still running.
Increasing the clean_up_cycle
decreases how often clean up cycles occur, which lowers the load on the system. However, you can also tune some other configurations to make the clean up cycle less intensive.
job_unresponsive_check
is the amount of time SSE will wait before considering a job to be stuck. If you know your jobs take longer, increase this time so that SSE will not check up on them as often. Conversely, if you know your jobs are shorter, decrease this time so jobs will get clean up faster.
job_unresponsive_check_stop
is the maximum allowed time for a job to run. Adjusting this to better fit your job profile can allow SSE to clean jobs faster.
When a job is "stuck", SSE will send a find_job
to the master to determine if the job is still running. If the master responds, SSE will consider the job to still be in progress. However, if the master does not respond, it will try again on the next clean up cycle. SSE continues this pattern until the master_unresponsive_check_limit
is reached. If you reduce this limit, SSE will reduce the number of check ups done. If you expect your jobs to complete within a clean up cycle, reducing this limit will clean up stuck jobs faster while also reducing load on the system.
schedule_cycle: 10 scheduler_max_futures_per_cycle: 500 scheduler_max_futures_weeks_ahead: 12
The scheduler runs based on the schedule_cycle
time. Increasing this time will reduce the load on the system. Decreasing this time means you will get finer grained schedules, thus increasing the load.
If you don't need a 10 second granularity, you can increase the timing to reduce load on your system without an perceivable impacts.
The scheduler also pre-calculates future schedules out to a certain limit based on scheduler_max_futures_weeks_ahead
. You can decrease this limit to reduce the amount of work needed to calculate future schedules. However, this is a one time calculation, so unless you change schedules a lot, you probably won't notice much of a change in load.
During these calculations, work is chunked by the scheduler_max_futures_per_cycle
configuration. If you want more schedules to be calculated, you can increase this number, or decrease it to reduce the load on individual jobs. This configuration should be based on how much data your database can handle at once.
enable_grains_indexing: true
A considerable amount of background work needs to be done to produce the auto-complete indexing needed for the UI. If you have a large amount of minions or targets, you can turn this processing off to considerably reduce the computation needed.
minion_onboarding_throttle: 0
Adding minions to SSE is an expensive operation, especially when there are many minions. If you add minions often from multiple masters, you might need to throttle masters so they do not push minions to SSE all at once. minion_onboarding_throttle
is the time SSE is locked before allowing another master to add more minions. If the master adding minions finishes before this time, then SSE is unlocked and the next master is let in.
websocket_debounce: 5
websocket_debounce
is the smallest interval a websocket subscription can get updated data. Increase this time to decrease load on the system. However, you will get updated data, less often.