AWS Batch Monitoring Integration
AWS Batch is a fully managed batch processing service that helps you to build and execute batch computing workloads on the AWS cloud. Batch processing refers to a cost-effective method for processing multiple software programs called jobs, quickly and efficiently.
Site24x7's integration with AWS Batch enables you to monitor and analyze your batch processing that includes tasks, such as submitted jobs, failed jobs, pending jobs, and succeeded jobs.
Use case
Consider that you have a AWS Batch monitor integrated with Site24x7, which has batch jobs in pending status, or in running status, and is using your AWS resources for a long time. In this case, if your account is integrated with Site24x7, then you can select multiple jobs at once, and can terminate or cancel them using IT automation. Similarly, you can also receive alerts when threshold breaches occur for your integrated monitor.
Benefits of the integration between Site24x7 and AWS Batch
By integrating Site24x7 with AWS Batch, you can:
- Set thresholds for metrics and receive alerts for threshold breaches so that you can identify and troubleshoot the AWS Batch monitor.
- Schedule IT automation to cancel or terminate your job at any time.
- Obtain a detailed overview of the job definition.
- View CloudWatch logs to find specific error codes or patterns for failed jobs.
Setup and configuration
- If you haven't done it already, enable access to your AWS resource by creating a cross-account IAM role between your account and Site24x7's AWS account. Learn more.
- On the Integrate AWS Account page, ensure that AWS Batch is selected in the Services to be discovered field.
Permissions
Ensure that Site24x7 receives the following permissions to monitor the batch jobs of your AWS resources:
- "batch:DescribeJobDefinitions"
- "batch:DescribeJobDefinitions"
- "batch:DescribeJobQueues"
- "batch:DescribeJobs"
- "batch:ListJobs"
- "batch:TerminateJob"
- "batch:CancelJob"
- "describeComputeEnvironments"
- "describeJobQueues"
- "listTagsForResource"
Polling frequency
Site24x7 queries AWS service-level APIs as per the set polling frequency (one minute to a day) to collect metrics from AWS Batch.
Supported metrics for compute environment
Metrics name | Description | Statistics | Unit |
---|---|---|---|
Total Submitted Jobs | The total number of submitted jobs in the queues attached to the compute environment. | Average | Count |
Total Pending Jobs | The total number of pending jobs in the queues attached to the compute environment. | Average | Count |
Total Runnable Jobs | The total number of runnable jobs in the queues attached to the compute environment. | Average | Count |
Total Starting Jobs | The total number of starting jobs in the queues attached to the compute environment. | Average | Count |
Total Running Jobs | The total number of running jobs in the queues attached to the compute environment. | Average | Count |
Total Succeeded Jobs | The total number of succeeded jobs in the queues attached to the compute environment. | Average | Count |
Total Failed Jobs | The total number of failed jobs in the queues attached to the compute environment. | Average | Count |
Total Queue Count | The total number of queues attached to the compute environment. | Average | Count |
Supported metrics for Job Queue
A job queue stores your submitted jobs until the AWS Batch Scheduler runs the job on a resource in your compute environment.
Metrics name | Description | Statistics | Unit |
---|---|---|---|
Submitted Jobs | The number of submitted jobs in the queue. | Average | Count |
Pending Jobs | The number of pending jobs in the queue. | Average | Count |
Runnable Jobs | The number of runnable jobs in the queue. | Average | Count |
Starting Jobs | The number of starting jobs in the queue. | Average | Count |
Running Jobs | The total number of running jobs in the queue. | Average | Count |
Succeeded Jobs | The total number of succeeded jobs in the queue. | Average | Count |
Failed Jobs | The total number of failed jobs in the queue. | Average | Count |
Total Compute Environment Attached | The total number of compute environment jobs in the queue. | Average | Count |
Licensing
Every AWS Batch monitor is considered a basic monitor.
IT Automation
You can add automations to perform AWS Batch actions. Go to Admin > IT Automation Templates (+) > Add Automation Templates. Once automations are added, you can schedule them to be executed one after the other.
Viewing AWS Batch
To view batch jobs of your AWS resources, log in to your Site24x7 account and navigate to Cloud > AWS > AWS Batch.
Site24x7's integration with AWS Batch also includes the AWS Batch Queue monitor. AWS Batch can have multiple queues attached. The AWS Batch Queue monitor provides the job details of each queue.
AWS Batch data
You can view the AWS Batch monitor data in the following tabs:
Summary
The Summary tab provides an overview of the AWS Batch metrics in the form of charts. These enable you to view details such as Total Submitted Jobs, Total Pending Jobs, and Total Running Jobs.
Batch Job Details
The Batch Job Details tab displays the job details related to the queues. You can filter and view the jobs based on the job status.
Monitored Resource
The Monitored Resource tab shows all the resources associated with the AWS Batch that are also monitored by Site24x7. You can also view the resource status, resource type, resource ID, and the configuration details.
Configuration
The Configuration tab provides the configuration details like Region, Job Name, Queue Status, and other details of the monitored resource.
Outages
The Outages tab displays your resource status history such as Down, Trouble, Critical, or Under Maintenance. You can also view the start time and end time of an outage, duration, and comments (if any) in the Outages tab.