Help Docs

Amazon FSx Monitoring Integration

Amazon FSx is a fully managed service from AWS that provides scalable and high-performance file storage in the cloud. It allows you to launch file systems that support popular file system types, making it easy to run traditional file-based applications in the cloud with the same performance, security, and features you're accustomed to with on-premises storage systems.

Amazon FSx offers the following file system types to cater to different workloads: Amazon FSx for Windows File Server, Amazon FSx for Lustre, Amazon FSx for NetApp ONTAP, and Amazon FSx for OpenZFS.

Overview

Site24x7 provides deep insights and proactive monitoring for your FSx file systems, helping you detect performance issues, optimize usage, and manage operational efficiency. You can also track the details, such as data repository tasks, backups, storage, and volumes.  

In addition to providing the FSx monitor, the integration also offers the following monitors so you can effectively monitor your FSx file systems hosted in your AWS infrastructure. 

  • FSx Storage Virtual Machine: Site24x7 provides comprehensive monitoring for storage virtual machines (SVMs) in Amazon FSx for NetApp ONTAP file systems, enabling you to track and optimize the performance, availability, and health of the virtualized storage infrastructure. 
  • FSx Volume: By monitoring volumes in NetApp ONTAP and OpenZFS file systems, Site24x7 enables you to maintain optimal storage performance, manage capacity efficiently, and protect critical data.

Use case

Consider an organization using Amazon FSx for NetApp ONTAP integrated with Site24x7 for managing its shared file system storage. When the file system IOPS starts spiking due to a seasonal traffic surge, Site24x7 sends an alert before it impacts user experience, enabling the team to scale up storage or take action to balance the load.

Additionally, Site24x7’s monitoring of Amazon FSx provides the organization with critical insights into performance, capacity, and data protection, resulting in improved reliability, efficiency, and cost control for their cloud storage environment.

Benefits of Site24x7's Amazon FSx integration

Integrating your Amazon FSx environment with Site24x7 offers the following benefits:

  • Obtain a unified monitoring solution for your diverse file systems, monitoring all your FSx environments in one place.
  • Monitor SVMs associated with your ONTAP file systems and volumes associated with your ONTAP and OpenZFS file systems.
  • Set thresholds for key metrics and receive alerts when they are breached.

Setup and configuration

1. If you haven't already, connect your AWS account with Site24x7's AWS account by either:

  • Creating Site24x7 as an IAM user.
  • Creating a cross-account IAM role. Learn more

2. On the Integrate AWS Account page, check the appropriate box for Amazon FSx. Learn more

Policy and permissions

Site24x7 uses various Amazon FSx APIs to collect information about your migration service. Assign the AWS managed policy ReadOnlyAccess to the Site24x7 entity (IAM user or IAM role) to help Site24x7 collect metrics and metadata. If you want to assign a custom policy, please make sure the following read-level actions are present in the policy JSON. Learn more

  • "fsx:ListTagsForResource",
  • "fsx:DescribeBackups",
  • "fsx:DescribeDataRepositoryTasks",
  • "fsx:DescribeFileSystems"
  • "fsx:DescribeVolumes"
  • "fsx:DescribeStorageVirtualMachines"

Polling Frequency

Site24x7 queries AWS to collect Amazon FSx performance metrics according to the configured polling frequency. The polling interval is one hour by default. Learn more.

Supported metrics

The metrics supported for Amazon Fsx monitoring are given below.

Performance metrics for file systems

Metric name Description Supported for file system type Statistic Unit
Data Read Bytes Number of bytes for file system read operations. All Sum MB
Data Write Bytes Number of bytes for file system write operations. All Sum MB
Data Write Operations Number of write operations. All Sum Count
Data Read Operations Number of read operations. All Sum Count
Metadata Operations Number of metadata operations. All Sum Count
Free Storage Capacity Amount or percentage of available storage capacity. All Average GB/Percentage
Total Throughput Total throughput of the file system. All Average MB/sec
Read Throughput Read throughput of the file system. All Average MB/sec
Write Throughput Read throughput of the file system. All Average MB/sec
Total IOPS Total number of I/O operations per second. All Average Count/sec
Read IOPS Total number of read I/O operations per second. All Average Count/sec
Write IOPS Total number of write I/O operations per second. All Average Count/sec
Metadata IOPS Total number of metadata I/O operations per second. All Average Count/sec
Client Connections The number of active connections between clients and the file server. Windows and OpenZFS Sum Count
Network Throughput Utilization The percent utilization of network throughput for the file system. All file system types except Lustre Average Percentage
CPU Utilization The percentage utilization of your file server’s CPU resources. All file system types except Lustre Average Percentage
Memory Utilization The percentage utilization of your file server’s memory resources. Windows and OpenZFS Average Percentage
File Server Disk Throughput Utilization The disk throughput between your file server and its storage volumes, as a percentage of the provisioned limit determined by throughput capacity. All file system types except Lustre Average Percentage
File Server Disk Throughput Balance The percentage of available burst credits for disk throughput between your file server and its storage volumes. Valid for file systems provisioned with a throughput capacity of 256 Mbps or less. All file system types except Lustre Average Percentage
File Server DiskIops Utilization The disk IOPS between your file server and storage volumes, as a percentage of the provisioned limit determined by throughput capacity. All file system types except Lustre Average Percentage
File Server DiskIops Balance The percentage of available burst credits for disk IOPS between your file server and its storage volumes. Valid for file systems provisioned with a throughput capacity of 256 Mbps or less. All file system types except Lustre Average Percentage
Disk Read Bytes The number of bytes for read operations that access storage volumes. All file system types except Lustre Sum Bytes
Disk Write Bytes The number of bytes for write operations that access storage volumes. All file system types except Lustre Sum Bytes
Disk Read Operations The number of read operations for the file server accessing storage volumes. All file system types except Lustre Sum Count
Disk Write Operations The number of write operations for the file server accessing storage volumes. All file system types except Lustre Sum Count
Disk Throughput Utilization (HDD only) The disk throughput between your file server and its storage volumes, as a percentage of the provisioned limit determined by the storage volumes. Windows Average Percentage
Disk Throughput Balance (HDD only) The percentage of available burst credits for disk throughput and disk IOPS for the storage volumes. Windows and OpenZFS Average Percentage
Disk IOPS Utilization (SSD only) The disk IOPS between your file server and storage volumes, as a percentage of the provisioned IOPS limit determined by the storage volumes. All file system types except Lustre Average Percentage
Deduplication Saved Storage The amount of storage space saved by data deduplication, if it is enabled. Windows Sum Bytes
Logical Disk Usage The amount of logical data stored (uncompressed). Lustre Sum Bytes
Physical Disk Usage The amount of storage physically occupied by file system data (compressed). Lustre Sum Bytes
File Create Operations The total number of file create operations. Lustre Sum Count
File Open Operations The total number of file open operations. Lustre Sum Count
File Delete Operations The total number of file delete operations. Lustre Sum Count
Stat Operations The total number of stat operations. Lustre Sum Count
Rename Operations The total number of directory renames, whether in-place directory renames or cross directory renames. Lustre Sum Count
Directory Delete Operations The total number of directory delete operations. Lustre Sum Count
Directory Create Operations The total number of directory create operations. Lustre Sum Count
NFS Bad Calls The number of calls rejected by the NFS server remote procedure call (RPC) mechanism. OpenZFS Sum Count
File Server Cache Hit Ratio For OpenZFS: The percentage of cache hits. For Single-AZ 2 (non-HA and HA) file systems, this metric reports the cache hit ratio for both the in-memory (ARC) and NVMe (L2ARC) caches. For Single-AZ 1 (non-HA and HA) file systems, this metric reports only the cache hit ratio for the ARC cache. For ONTAP: The percentage of all read requests that are served by data in the file system's RAM and NVMe caches. A higher percentage means that more reads are served by the file system's read caches. OpenZFS and ONTAP Average Percentage
Compression Ratio The ratio of compressed storage usage to uncompressed storage usage. OpenZFS Average Ratio
Storage Efficiency Savings The bytes saved from storage efficiency features (compression, deduplication, and compaction). ONTAP Sum Bytes
Logical Data Stored The total amount of logical data stored on the file system, considering both the SSD tier and the capacity pool tier. This metric includes the total logical size of snapshots and FlexClones but does not include storage efficiency savings achieved through compression, compaction, and deduplication. ONTAP Sum Bytes
Network Sent Bytes The number of bytes (network I/O) sent by the file system. ONTAP Sum Bytes
Network Received Bytes The number of bytes (network I/O) received by the file system. ONTAP Sum Bytes
Data Read Operation Time The sum of total time spent within the file system for read operations (network I/O) from clients accessing data in the file system. ONTAP Sum Bytes
Data Write Operation Time The sum of total time spent within the file system for fulfilling write operations (network I/O) from clients accessing data in the file system. ONTAP Sum Bytes
Capacity Pool Read Bytes The number of bytes read (network I/O) from the file system's capacity pool tier. ONTAP Sum Bytes
Capacity Pool Write Bytes The number of bytes written (network I/O) to the file system's capacity pool tier. ONTAP Sum Bytes
Capacity Pool Read Operations The number of read operations (network I/O) from the file system's capacity pool tier. This translates to a capacity pool read request. ONTAP Sum Count
Capacity Pool Write Operations The number of write operations (network I/O) to the file system from the capacity pool tier. This translates to a write request. ONTAP Sum Count
Storage Capacity Utilization The percent utilization of storage capacity for the file system. All Average Percentage
Storage Used The total storage capacity used for the file system in GB. All Sum Bytes
Read Operations The average data read operation time per data read operation. ONTAP Average Seconds
Write Operations The average data write operation time per data write operation. ONTAP Average Seconds
Metadata Operations The average time taken per meta data operation. ONTAP Average Seconds
Capacity Pool Tier The used physical storage capacity in bytes, specific to the storage tier. This value includes savings from storage-efficiency features, such as data compression and deduplication. With StorageTier as StandardCapacityPool ONTAP Average Bytes
Primary Tier Capacity The storage capacity for all data types with storage tier as SSD. ONTAP Average Bytes
Primary Tier Used The used physical storage capacity in bytes, specific to the storage tier. This value includes savings from storage-efficiency features, such as data compression and deduplication. With StorageTier as SSD, this metric measures the logical space usage for this volume for your SSD. ONTAP Average Bytes
Primary Tier Avail The available or unused physical storage capacity in bytes, specific to the storage tier. ONTAP Average Bytes
Metadata Operation Time The total time taken in meta data operation. ONTAP Sum Seconds
Available Volumes The number of available volumes. OpenZFS and ONTAP Sum Count
Failed Volumes The number of failed volumes. OpenZFS and ONTAP Sum Count
Misconfigured Volumes The number of misconfigured volumes. OpenZFS and ONTAP Sum Count
Created Volumes The number of created volumes. OpenZFS and ONTAP Sum Count
Available SVM The number of available SVM (Support Vector Machine). ONTAP Sum Count
Failed SVM The number of failed SVM ONTAP Sum Count
Misconfigured SVM The number of misconfigured SVM. ONTAP Sum Count
Total Volumes The total number of volumes in the file system. OpenZFS and ONTAP Sum Count
Total SVM The total number of storage virtual machines in the file system. ONTAP Sum Count
No Data Compression OpenZFS Volume The method used to compress the data on the volume can be NONE | ZSTD | LZ4. This metric shows the number of volumes that use no compression method. OpenZFS Sum Count
Zstandard (ZSTD) Compression OpenZFS Volume The number of volumes that use the Zstandard (ZSTD) compression algorithm to compress the data on the volume. OpenZFS Sum Count
LZ4 Compression OpenZFS Volume The number of volumes that use the LZ4 compression algorithm to compress the data on the volume. OpenZFS Sum Count
Clone Volume The number of volumes that reference the data in the origin snapshot, i.e. that uses the clone strategy when copying data from the snapshot to the new volume. OpenZFS Sum Count
Full Copy Volume The number of volumes which copies all data from the snapshot to the new volume i.e. that uses full-copy strategy when copying data from the snapshot to the new volume. OpenZFS Sum Count
Incremental Copy OpenZFS Volume The number of volumes that use an incremental copy strategy when copying data from the snapshot to the new volume. This option is only for updating an existing volume by using a snapshot from another FSx for the OpenZFS file system. OpenZFS Sum Count

Performance metrics for data repository tasks

Attribute Description Statistic Data type
Succeeded Count Number of files successfully exported. Sum Count
Failed Count Number of files that failed to export. Sum Count
Total Count Total number of files to export. Sum Count

Performance metrics for FSx Storage Virtual Machine

Metrics name Description Statistic Unit
Total Volumes Total number of volumes in the SVM. Sum Count
Available Volumes Number of available volumes. Sum Count
Created Volume Number of created volumes. Sum Count
Failed Volumes Number of failed volumes. Sum Count
Misconfigured Volumes Number of misconfigured volumes. Sum Count
FlexVol Volume Number of FlexVol style volumes Sum Count
FlexGroup Volume Number of FlexGroup Volume style volumes. Sum Count
Unix Volume Number of UNIX type security style volumes. The security style for the volume can be UNIX, NTFS, or MIXED. Sum Count
Ntfs Volume Number of NTFS type security style volumes. Sum Count
Mixed Volume Number of MIXED security style volumes. Sum Count
RW (Read/Write) Ontap Volume Number of RW ONTAP volume type. Sum Count
DP (Data-Protection) Ontap Volume Number of DP ONTAP volume type. Sum Count
LS (Load-Sharing) Ontap Volume Number of LS ONTAP volume type. Sum Count
No FlexCache Volume FlexCache endpoint type of the volume can be NONE, ORIGIN, or CACHE. This metric indicates the number of None FlexCache Endpoint type volumes. Sum Count
Origin FlexCache Volume Number of Origin FlexCache Endpoint type volumes. Sum Count
FlexCache Volume Number of Cache FlexCache Endpoint type volumes. Sum Count

Performance metrics for FSx Volume

Metrics name Description Statistic Unit
Data Read Bytes The number of bytes (network I/O) read from the volume by clients. Sum Bytes
Data Write Bytes The number of bytes (network I/O) written to the volume by clients. Sum Bytes
Data Read Operations The number of read operations (network I/O) on the volume by clients. Sum Count
Data Write Operations The number of write operations (network I/O) on the volume by clients. Sum Count
Metadata Operations The number of I/O operations (network I/O) from metadata activities on the volume by clients. Sum Count
Data Read Operation Time The sum of total time spent within the volume for read operations (network I/O) from clients accessing data in the volume. Sum Seconds
Data Write Operation Time The sum of total time spent within the volume for fulfilling write operations (network I/O) from clients accessing data in the volume. Sum Seconds
Metadata Operation Time The sum of total time spent within the volume for fulfilling metadata operations (network I/O) from clients that are accessing data in the volume. Sum Seconds
Capacity Pool Read Bytes The number of bytes read (network I/O) from the volume's capacity pool tier. Sum Bytes
Capacity Pool Write Bytes The number of bytes written (network I/O) to the volume's capacity pool tier. Sum Bytes
Capacity Pool Read Operations The number of read operations (network I/O) from the volume's capacity pool tier. This translates to a capacity pool read request. Sum Count
Capacity Pool Write Operations The number of write operations (network I/O) to the volume from the capacity pool tier. This translates to a write request. Sum Count
Storage Used The used logical storage capacity of the volume. Maximum Bytes
Storage Capacity The size of the volume in bytes. Maximum Bytes
Storage Capacity Utilization The storage capacity utilization of the volume. Average Percent
Files Used The used files (number of files or i nodes) on the volume. Maximum Count
Files Capacity The total number of i nodes that can be created on the volume. Maximum Count
Free Storage Space The unused or free logical storage capacity of the volume. Sum Bytes
Free Storage % The percentage of unused logical storage capacity of the volume. Average Percent
Total Throughput The total throughput of data read and data write bytes. Average MB/sec
Read Throughput The total throughput of data read bytes. Average MB/sec
Write Throughput The total throughput of data write bytes. Average MB/sec
Total IOPS The number of operations (network I/O) on the volume by clients, which includes data read and write operations and metadata operations per sec. Average Count/sec
Read IOPS The number of read operations (network I/O) on the volume by clients per second. Average Count/sec
Write IOPS The number of write operations (network I/O) on the volume by clients per second. Average Count/sec
Metadata IOPS The number of metadata operations (network I/O) on the volume by clients per second. Average Count/sec
User Data The amount of logical space used, in bytes. This metric measures different types of space consumption depending on the dimensions used with this metric. Here it includes dimensions StorageTier as All and DataType as User. Average Bytes
Snapshot Data The amount of logical space used, in bytes. This metric measures different types of space consumption depending on the dimensions used with this metric. Here it includes dimensions StorageTier as All and DataType as Snapshot. Average Bytes
Other Data The amount of logical space used for all StorageTier with DataType as Other, in bytes. Average Bytes
Read Latency The time taken per Data Read Operation. Average Seconds
Write Latency The time taken per Data Write Operation. Average Seconds
Metadata Latency The time taken per Metadata Operation. Average Seconds

Threshold configuration

To configure thresholds for your Amazon FSx monitor:

  1. Log in to Site24x7 and navigate to Admin > Configuration Profiles > Threshold and Availability.
  2. Click Add Threshold Profile.
  3. Select the applicable monitor type from the Monitor Type drop-down menu and provide an appropriate name in the Display Name field. The applicable monitor types are FSx File System, FSx Storage Virtual Machine, and FSx Volume.
  4. The supported metrics are displayed in the Threshold Configuration section. You can set threshold values for all the metrics mentioned above.
  5. Click Save.

Forecast

Estimate future values of the following performance metrics and make informed decisions about adding capacity or scaling your AWS infrastructure.

  • Data Read Bytes
  • Data Write Bytes
  • Data Write Operations
  • Data Read Operations
  • Metadata Operations

IT Automations

You can add automations for the AWS services supported by Site24x7. Log in to Site24x7 and go to Admin > IT Automation Templates (+) > Add Automation Templates. Once automations are added, you can schedule them to be executed one after the other.

You can now create a data repository task or a backup for the file system using Amazon FSx automations.

Licensing

  • FSx File System: Each FSx File System monitor is considered a basic monitor.
  • FSx Storage Virtual Machine: For the FSx Storage Virtual Machine monitor, five monitors utilize one basic monitor license.
  • FSx Volume: Each FSx Volume monitor is considered a basic monitor.

Viewing Amazon FSx monitor data

To monitor your Amazon FSx, log in to Site24x7 and navigate to Cloud > AWS > Amazon FSx.

Site24x7's Amazon FSx monitoring interface

Amazon FSx

Summary

Gain an overview of the different events occurring within each FSx file system with time series charts. This section provides you with operational information on data read operations, data write operations, metadata operations, throughput, read or write bytes, IOPS usage, and more.

Data Repository Tasks

All the metadata related to repository tasks is listed here. This includes information like the task ID, status of the task, life cycle state, failure reason (if any), and time stamps of task creation, start time, and end time. The Action column lets you set up alerts or add an automation in case the data repository task is down.

Backup Details

The backup details carried out for any FSx file system will be listed here. This includes information about the backup, like the time, type, ID, state of the backup life cycle, KMS key ARN, and Active Directory ID. If you want to delete the monitoring setup for a particular backup, just click the delete option next to each backup task.

Outages

The Outages tab shows the history of your file systems’ various states, like down, trouble, critical, or maintenance. It also provides details on the start and end time of an outage, its duration, and comments (if any). You can also manually add an outage and edit or delete the comments in this same section.

Log Report

Here you can view the audit log data for an FSx file system, along with details on the timestamp, status, data read bytes, data write bytes, and data read/write operations.

FSx Storage Virtual Machine SVM

In the Amazon FSx monitor, both Storage Virtual Machines and Volumes tabs will be displayed for the NetApp ONTAP file system type. 

Navigate to the Storage Virtual Machines tab and click the desired monitor name to obtain the following FSx Storage Virtual Machine monitor details. 

Summary

The Summary tab provides an overview of the events timeline and metrics in the form of charts.

Volumes

The Volumes tab displays the list of Volume monitors associated with the SVMs along with their status and monitor types. You can configure thresholds by clicking the edit button in the Action column of the preferred monitor. Click the monitor name to obtain the FSx Volume monitor details.

Configuration

The Configuration tab displays the configuration details of the FSx Storage Virtual Machine monitor, such as Storage Virtual Machine Name, Storage Virtual Machine ID, File System ID, and Storage Virtual Machine ARN.

Outages

The Outages tab provides details on an outage's start time, end time, duration, and comments (if any).

Inventory

The Inventory tab displays details like the Storage Virtual Machine ID, Region, and Monitor Licensing Category. The Threshold and Availability Profile and the Notification Profile can be set according to the user and viewed from this tab.

Log Report

The Log Report tab offers a consolidated report of each FSx SVM's log status, which can be downloaded as a CSV file.

FSx Volume

In the Amazon FSx monitor, the Volumes tab will be displayed for NetApp ONTAP and OpenZFS file system types. Navigate to the Volumes tab and click the desired monitor name to obtain the following FSx Volume monitor details.

Summary

The Summary tab provides an overview of the events timeline and metrics in the form of charts.

Configuration

The Configuration tab displays the configuration details of the FSx Storage Virtual Machine monitor such as Volume Name, File System ID, and Volume ARN.

Backup Details

The Backup Details tab displays the backup details carried out for the FSx Volume monitor. This includes information about the backup, such as the Time, ID, State of the Backup Life Cycle, and Active Directory ID. If you wish to delete the monitoring setup for a particular backup, click the delete option next to each backup task.

Zia Forecast

The Zia Forecast tab displays the forecast data for the FSx Volume monitor in the form of charts based on historical time series data. 

Outages

The Outages tab provides details on an outage's start time, end time, duration, and comments, if any.

Inventory

The Inventory tab displays details like the Storage Virtual Machine ID, Region, and Monitor Licensing Category on the Inventory tab. The Threshold and Availability Profile and the Notification Profile can be set according to the user and viewed from this tab.

Log Report

The Log Report tab offers a consolidated report of each FSx Storage Virtual Machine monitor's log status, which can be downloaded as a CSV file.

Was this document helpful?

Would you like to help us improve our documents? Tell us what you think we could do better.


We're sorry to hear that you're not satisfied with the document. We'd love to learn what we could do to improve the experience.


Thanks for taking the time to share your feedback. We'll use your feedback to improve our online help resources.

Shortlink has been copied!