Amazon Elastic Map Reduce (EMR) Monitoring
Amazon EMR is a web service that enables users to run Big Data frameworks to process large volumes of data. Site24x7 monitors EMR to ensure uninterrupted data analysis and notifies users about the status changes in the associated AWS services, such as EC2 instances in the EMR cluster.
Setup and configuration
- If you haven't done it already, enable access to your AWS resource by creating Site24x7 as an IAM user or by creating a cross-account IAM role between your account and Site24x7's AWS account. Learn more.
- Next, In the Integrate AWS Account page, please make sure the EMR checkbox is selected in the Services to be discovered field. Learn more.
Policies and permissions
Please make sure the following read level actions are present in the IAM policy assigned to Site24x7 entity. Learn more.
- "elasticmapreduce:ListSecurityConfigurations",
- "elasticmapreduce:DescribeCluster",
- "elasticmapreduce:ListClusters",
- "elasticmapreduce:ListBootstrapActions",
- "elasticmapreduce:ListSteps",
- "elasticmapreduce:ListInstanceFleets",
- "elasticmapreduce:ListInstanceGroups",
- "elasticmapreduce:ListInstances"
Polling frequency
Site24x7 queries the AWS service level APIs and CloudWatch APIs as per the poll frequency set (1 minute to a day), to collect performance metrics. Learn more.
Supported Metrics
Attribute | Description | Data type | Statistic |
Core Nodes Pending | The number of core nodes waiting to be assigned. This metric is reported only if a core node exists. | Count | Maximum |
Core Nodes Running | The number of core nodes working. This metric is reported only if a core node exists. | Count | Maximum |
Task Nodes Pending | The number of task nodes waiting to be assigned. This metric is reported only if a Task Node exists. | Count | Maximum |
Task Nodes Running | The number of task nodes working. This metric is reported only if a Task Node exists. | Count | Maximum |
Capacity Remaining | The amount of remaining HDFS disk capacity. | GB | Minimum |
Corrupt Blocks | The number of blocks that HDFS reports as corrupted. | Count | Maximum |
DFS Pending Replication Blocks | The status of block replication: blocks being replicated, age of replication requests, and unsuccessful replication requests. | Count | Maximum |
HDFS Bytes Read | The number of bytes read from HDFS. | MB | Sum |
HDFS Bytes Written | The number of bytes written to HDFS. | MB | Sum |
HDFS Utilization | The percentage of HDFS storage currently used. | Percentage | Average |
Cluster Idle Status | Indicates value as i when cluster is in idle state otherwise 0. | Count | Maximum |
Live Data Nodes | The percentage of data nodes that are receiving work from Hadoop. | Percentage | Average |
Missing Blocks | The number of blocks in which HDFS has no replicas. | Count | Maximum |
Pending Deletion Blocks | The number of blocks marked for deletion. | Count | Maximum |
S3 Bytes Read | The number of bytes read from Amazon S3. | MB | Sum |
Live Task Trackers | The percentage of task trackers that are functional. | Percentage | Average |
Map Slots Open | The unused map task capacity in Hadoop version 1. | Count | Maximum |
Blacklisted Task Trackers | The number of task trackers that are blacklisted in Hadoop version 1. | Count | Maximum |
Graylisted Task Trackers | The number of task trackers that are grey listed in Hadoop version 1.. | Count | Maximum |
Reduce Slots Open | Unused reduce task capacity in Hadoop version 1. | Count | Maximum |
Remaining Map Tasks | The number of remaining map tasks for each job in Hadoop version 1. | Count | Maximum |
Remaining Map Tasks per Slot | The ratio of the total map tasks remaining to the total map slots available in the cluster in Hadoop version 1. | Count | Maximum |
Remaining Reduce Tasks | The number of remaining reduce tasks for each job in Hadoop version1. | Count | Maximum |
Running Map Tasks | The number of running map tasks for each job in Hadoop version 1. | Count | Maximum |
Running Reduce Tasks | The number of running reduce tasks for each job in Hadoop version 1. | Count | Maximum |
Apps Completed | The number of applications submitted to YARN that have completed in Hadoop version 2. | Count | Maximum |
Apps Failed | The number of applications submitted to YARN that have failed to complete in Hadoop version 2. | Count | Maximum |
Apps Killed | The number of applications submitted to YARN that have been killed in Hadoop version 2. | Count | Maximum |
Apps Pending | The number of applications submitted to YARN that are in a pending state in Hadoop version 2. | Count | Maximum |
Apps Running | The number of applications submitted to YARN that are running in Hadoop version 2. | Count | Maximum |
Apps Submitted | The number of applications submitted to YARN in Hadoop version 2. | Count | Maximum |
Container Allocated | The number of resource containers allocated by the ResourceManager for Hadoop version 2. | Count | Maximum |
Container Pending | The number of containers in the queue that have not yet been allocated in Hadoop version 2. | Count | Maximum |
Container Reserved | The number of containers reserved in Hadoop version 2. | Count | Maximum |
Memory Reserved | The amount of memory reserved in Hadoop version 2. | MB | Maximum |
Memory Allocated | The amount of memory allocated to the cluster in Hadoop version 2. | MB | Maximum |
Memory Available | The amount of memory available to be allocated in Hadoop version 2. | MB | Minimum |
Memory Total | The total amount of memory in the cluster in Hadoop version 2. | MB | Maximum |
MR Active Nodes | The number of nodes presently running MapReduce tasks or jobs in Hadoop version 2. | Count | Minimum |
MR Decommissioned Nodes | The number of nodes allocated to MapReduce applications that have been marked in a DECOMMISSIONED state in Hadoop version 2. | Count | Maximum |
MR Lost Nodes | The number of nodes allocated to MapReduce that have been marked in a LOST state in Hadoop version 2. | Count | Maximum |
MR Rebooted Nodes | The number of nodes available to MapReduce that have been rebooted and marked in a REBOOTED state in Hadoop version 2. | Count | Maximum |
MR Total Nodes | The number of nodes presently available to MapReduce jobs in Hadoop version 2. | Count | Maximum |
MR Unhealthy Nodes | The number of nodes available to MapReduce jobs marked in an UNHEALTHY state in Hadoop version 2. | Count | Maximum |
Container Pending Ratio | The ratio of pending containers to containers allocated in Hadoop version 2. | Count | Maximum |
YARN Memory Available | The percentage of remaining memory available to YARN in Hadoop version 2. | Percentage | Average |
HBase Backup Failed | Status of the previous backup. It is set to 1 if the backup attempt had failed. This metric is collected only if HBase is present. | Count | Maximum |
Most Recent Backup | The amount of time it took the previous backup to complete. This metric is collected only if HBase is present. | Minutes | Average |
Time Since Last Successful Backup | TThe number of elapsed minutes after the last successful HBase backup started on your cluster. This metric is collected only if HBase is present. | Minutes | Average |
Multimaster Instancegroup Nodes Running | The number of running master nodes.This metric is collected only with Hadoop version 2 and if MultiMaster exists. | Count | Maximum |
Multimaster Instancegroup Nodes Running Percentage | The percentage of master nodes that are running over the requested master node instance count. This metric is collected only with Hadoop version 2 and if MultiMaster exists. | Percentage | Average |
Multimaster Instancegroup Nodes Requested | The number of requested master nodes. This metric is collected only with Hadoop version 2 and if MultiMaster exists. | Count | Maximum |
Forecast
Estimate future values of the following performance metrics and make informed decisions about adding capacity or scaling your AWS infrastructure.
- Capacity Remaining
- HDFS Bytes Read
- HDFS Bytes Written
- HDFS Utilization
- S3 Bytes Read
- S3 Bytes Written
- Total Load
Site24x7's EMR Monitoring Interface
Summary
Receive an overview of all your important EMR metrics including HDFS, YARN, node, and memory metrics as time series charts.
Monitored Resources
If you're monitoring your EC2 instances or S3 buckets with Site24x7, the statuses of these services will be listed in the Monitored Resources tab. You can click on any of the services to view their detailed metrics. You can also set thresholds and be notified when any of these services fail by clicking the pencil icon under Action.
Configurations
This tab displays additional configuration classifications for each instance group in a cluster. If the configurations for an instance group are modified, the new configurations will be reflected here.
Steps
The actions that are to be executed by the cluster are listed as steps.
Bootstrap Actions
Bootstrap actions can be used to install additional software or customize the configuration of cluster instances. The custom bootstrap actions are listed under this tab.
Security Configuration
Security configurations involve creating data encryption, Kerberos authentication, and Amazon S3 authorization for EMR File System. Such permissions defined for the user role or account are displayed in JSON format as shown below.
Cluster Summary
The inventory details of the EMR Cluster is displayed. Here, you will see the cluster status, the applications associated with it, the EC2 instance deployed, Subnet ID and similar details.
Additional Security Group for MasterThe extra security group added by the user for the master node.
Attribute | Description |
Release Label | Amazon EMR release version. |
Availability Zone | Region where EMR is hosted. |
Instance Group Type | The instance group with which EC2 instances are associated with. |
Auto-termination | State of auto-termination: true or false. |
Applications | Open-source applications Amazon EMR installed while creating the cluster. |
Master Public DNS | Public DNS name of the master node. |
Cluster Status | State of the cluster: active or terminated. |
State Change Message | The status of the EMR cluster after a change in state. |
Log URI | The path of the logs stored in Amazon S3. |
Creation Time | Denotes the time when the EMR service was created. |
Elapsed Time | Total run time of the cluster. |
Cluster Ready Time | Denotes the time when the cluster was created. |
Visible to all Users | Lists the users who can view EMR. |
Key Name | The key provided by the user to access the EC2 instance. |
Subnet ID | The subnet ID in the VPC where the NAT gateway is present. |
Security Group for Master | The name of the managed security group when a cluster is created. |
Security Group for Core and Task | The name of the security group for core and task. |
EC2 Instance Profile | The name of the EC2 instance profile. |
EMR Role | The IAM policy attached to the EMR. |
Requested Subnet ID | Extra subnets attached by the user. |
Autoscaling Role | The IAM role associated with the autoscaling instance.. |
Scaledown Behavior | Mentions one of the two behaviors: Terminate at the instance-hour boundary or terminate at task completion. |
EBS Rootvolume Size | Displays the capacity of the EBS. |
Additional Security Group for Core and Task | The extra security group added by the user for the core and task nodes. |
Requested Availability Zone | The extra regions added by the user. |
Security Configuration | User role or account permissions of EMR. |
Realm | The Kerberos realm name. |
Custom AMI ID | Displays the custom Amazon Linux AMI created by the user. |
Running AMI Version | The current version of the AMI release. |