MapReduce YARN Memory Parameters
search cancel

MapReduce YARN Memory Parameters

book

Article ID: 295026

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

This article will provide information on Hadoop parameters used to manage memory allocations for MapReduce jobs that are executed in the YARN framework.

Environment


Resolution

What is a container?

A container is a YARN JVM process. In MapReduce, the application master service, mapper and reducer tasks are all containers that execute inside the YARN framework. You can view running container stats by accessing the Resource Manager web interface: http://<resource_manager_host>:8088/cluster/scheduler


The Basics

YARN Resource Manager (RM) allocates resources to the application through logical queues which include memory, CPU, and disk resources. By default, the RM will allow up to 8192MB (yarn.scheduler.maximum-allocation-mbto an Application Master (AM) container allocation request. The default minimum allocation is 1024MB (yarn.scheduler.minimum-allocation-mb). The AM can only request resources from the RM that are in increments of yarn.scheduler.minimum-allocation-mb and does not exceed yarn.scheduler.maximum-allocation-mbThe AM is responsible for rounding off mapreduce.map.memory.mb and mapreduce.reduce.memory.mb to a value divisible by the yarn.scheduler.minimum-allocation-mbRM will deny an allocation greater than 8192MB and a value not divisible by 1024MB in the following example.  


The Params

  • YARN
    • yarn.scheduler.minimum-allocation-mb
    • yarn.scheduler.maximum-allocation-mb
    • yarn.nodemanager.vmem-pmem-ratio
    • yarn.nodemanager.resource.memory.mb
  • MapReduce
    • Map Memory
      • mapreduce.map.java.opts
      • mapreduce.map.memory.mb
    • Reduce Memory
      • mapreduce.reduce.java.opts
      • mapreduce.reduce.memory.mb


The diagram above shows an example of a map, reduce, and application master container (AM JVM). The JVM rectangle represents the server process. The Max Heap and Max Virtual rectangles represent the maximum logical memory constraints for the JVM processes which are enforced by the NodeManager. 


The Map container memory allocation mapreduce.map.memory.mb is set to 1536MB in this example. The AM will request 2048MB from the RM for a Map Container because the minimum allocation, yarn.scheduler.minimum-allocation-mb, is set to 1024MB. This allocation is a logical allocation used by the NodeManager to monitor the process memory usage. If the Map tasks heap usage exceeds 2048MB then NodeManager will kill the task. The JVM heap size set to 1024MB mapreduce.map.java.opts=-Xmx1024m which fits well inside the logical allocation of 2048MB. The same is true for the reduce container. mapreduce.reduce.memory.mb is set to 3072MB.  


When a MapReduce job completes you will see several counters dumped at the end of the job. The three memory counters below show how much physical memory was allocated vs how much virtual memory. 

Physical memory (bytes) snapshot=21850116096
Virtual memory (bytes) snapshot=40047247360
Total committed heap usage (bytes)=22630105088

Virtual Memory

By default yarn.nodemanager.vmem-pmem-ratio is set to 2.1. This means that a map or reduce container can allocate up to 2.1 times the mapreduce.reduce.memory.mb or mapreduce.map.memory.mb  of virtual memory before the NM will kill the container. If the mapreduce.map.memory.mb is set to 1536MB then the total allowed virtual memory is 2.1 * 1536 = 3225.6MB.


The log messages look similar to the below example when the NM kills a container due to memory oversubscription. 

Current usage: 2.1gb of 2.0gb physical memory used; 1.6gb of 3.15gb virtual memory used. Killing container.