Amazon FSx Monitoring Integration
Amazon FSx is a fully managed service from AWS that provides scalable and high-performance file storage in the cloud. It allows you to launch file systems that support popular file system types, making it easy to run traditional file-based applications in the cloud with the same performance, security, and features you're accustomed to with on-premises storage systems.
Amazon FSx offers the following file system types to cater to different workloads: Amazon FSx for Windows File Server, Amazon FSx for Lustre, Amazon FSx for NetApp ONTAP, and Amazon FSx for OpenZFS.
Overview
Site24x7 provides deep insights and proactive monitoring for your FSx file systems, helping you detect performance issues, optimize usage, and manage operational efficiency. You can also track the details, such as data repository tasks, backups, storage, and volumes.
In addition to providing the FSx monitor, the integration also offers the following monitors so you can effectively monitor your FSx file systems hosted in your AWS infrastructure.
- FSx Storage Virtual Machine: Site24x7 provides comprehensive monitoring for storage virtual machines (SVMs) in Amazon FSx for NetApp ONTAP file systems, enabling you to track and optimize the performance, availability, and health of the virtualized storage infrastructure.
- FSx Volume: By monitoring volumes in NetApp ONTAP and OpenZFS file systems, Site24x7 enables you to maintain optimal storage performance, manage capacity efficiently, and protect critical data.
Use case
Consider an organization using Amazon FSx for NetApp ONTAP integrated with Site24x7 for managing its shared file system storage. When the file system IOPS starts spiking due to a seasonal traffic surge, Site24x7 sends an alert before it impacts user experience, enabling the team to scale up storage or take action to balance the load.
Additionally, Site24x7’s monitoring of Amazon FSx provides the organization with critical insights into performance, capacity, and data protection, resulting in improved reliability, efficiency, and cost control for their cloud storage environment.
Benefits of Site24x7's Amazon FSx integration
Integrating your Amazon FSx environment with Site24x7 offers the following benefits:
- Obtain a unified monitoring solution for your diverse file systems, monitoring all your FSx environments in one place.
- Monitor SVMs associated with your ONTAP file systems and volumes associated with your ONTAP and OpenZFS file systems.
- Set thresholds for key metrics and receive alerts when they are breached.
Setup and configuration
1. If you haven't already, connect your AWS account with Site24x7's AWS account by either:
- Creating Site24x7 as an IAM user.
- Creating a cross-account IAM role. Learn more
2. On the Integrate AWS Account page, check the appropriate box for Amazon FSx. Learn more
Policy and permissions
Site24x7 uses various Amazon FSx APIs to collect information about your migration service. Assign the AWS managed policy ReadOnlyAccess to the Site24x7 entity (IAM user or IAM role) to help Site24x7 collect metrics and metadata. If you want to assign a custom policy, please make sure the following read-level actions are present in the policy JSON. Learn more
- "fsx:ListTagsForResource",
- "fsx:DescribeBackups",
- "fsx:DescribeDataRepositoryTasks",
- "fsx:DescribeFileSystems"
- "fsx:DescribeVolumes"
- "fsx:DescribeStorageVirtualMachines"
Polling Frequency
Site24x7 queries AWS to collect Amazon FSx performance metrics according to the configured polling frequency. The polling interval is one hour by default. Learn more.
Supported metrics
The metrics supported for Amazon Fsx monitoring are given below.
Performance metrics for file systems
Metric name | Description | Supported for file system type | Statistic | Unit |
---|---|---|---|---|
Data Read Bytes | Number of bytes for file system read operations. | All | Sum | MB |
Data Write Bytes | Number of bytes for file system write operations. | All | Sum | MB |
Data Write Operations | Number of write operations. | All | Sum | Count |
Data Read Operations | Number of read operations. | All | Sum | Count |
Metadata Operations | Number of metadata operations. | All | Sum | Count |
Free Storage Capacity | Amount or percentage of available storage capacity. | All | Average | GB/Percentage |
Total Throughput | Total throughput of the file system. | All | Average | MB/sec |
Read Throughput | Read throughput of the file system. | All | Average | MB/sec |
Write Throughput | Read throughput of the file system. | All | Average | MB/sec |
Total IOPS | Total number of I/O operations per second. | All | Average | Count/sec |
Read IOPS | Total number of read I/O operations per second. | All | Average | Count/sec |
Write IOPS | Total number of write I/O operations per second. | All | Average | Count/sec |
Metadata IOPS | Total number of metadata I/O operations per second. | All | Average | Count/sec |
Client Connections | The number of active connections between clients and the file server. | Windows and OpenZFS | Sum | Count |
Network Throughput Utilization | The percent utilization of network throughput for the file system. | All file system types except Lustre | Average | Percentage |
CPU Utilization | The percentage utilization of your file server’s CPU resources. | All file system types except Lustre | Average | Percentage |
Memory Utilization | The percentage utilization of your file server’s memory resources. | Windows and OpenZFS | Average | Percentage |
File Server Disk Throughput Utilization | The disk throughput between your file server and its storage volumes, as a percentage of the provisioned limit determined by throughput capacity. | All file system types except Lustre | Average | Percentage |
File Server Disk Throughput Balance | The percentage of available burst credits for disk throughput between your file server and its storage volumes. Valid for file systems provisioned with a throughput capacity of 256 Mbps or less. | All file system types except Lustre | Average | Percentage |
File Server DiskIops Utilization | The disk IOPS between your file server and storage volumes, as a percentage of the provisioned limit determined by throughput capacity. | All file system types except Lustre | Average | Percentage |
File Server DiskIops Balance | The percentage of available burst credits for disk IOPS between your file server and its storage volumes. Valid for file systems provisioned with a throughput capacity of 256 Mbps or less. | All file system types except Lustre | Average | Percentage |
Disk Read Bytes | The number of bytes for read operations that access storage volumes. | All file system types except Lustre | Sum | Bytes |
Disk Write Bytes | The number of bytes for write operations that access storage volumes. | All file system types except Lustre | Sum | Bytes |
Disk Read Operations | The number of read operations for the file server accessing storage volumes. | All file system types except Lustre | Sum | Count |
Disk Write Operations | The number of write operations for the file server accessing storage volumes. | All file system types except Lustre | Sum | Count |
Disk Throughput Utilization | (HDD only) The disk throughput between your file server and its storage volumes, as a percentage of the provisioned limit determined by the storage volumes. | Windows | Average | Percentage |
Disk Throughput Balance | (HDD only) The percentage of available burst credits for disk throughput and disk IOPS for the storage volumes. | Windows and OpenZFS | Average | Percentage |
Disk IOPS Utilization | (SSD only) The disk IOPS between your file server and storage volumes, as a percentage of the provisioned IOPS limit determined by the storage volumes. | All file system types except Lustre | Average | Percentage |
Deduplication Saved Storage | The amount of storage space saved by data deduplication, if it is enabled. | Windows | Sum | Bytes |
Logical Disk Usage | The amount of logical data stored (uncompressed). | Lustre | Sum | Bytes |
Physical Disk Usage | The amount of storage physically occupied by file system data (compressed). | Lustre | Sum | Bytes |
File Create Operations | The total number of file create operations. | Lustre | Sum | Count |
File Open Operations | The total number of file open operations. | Lustre | Sum | Count |
File Delete Operations | The total number of file delete operations. | Lustre | Sum | Count |
Stat Operations | The total number of stat operations. | Lustre | Sum | Count |
Rename Operations | The total number of directory renames, whether in-place directory renames or cross directory renames. | Lustre | Sum | Count |
Directory Delete Operations | The total number of directory delete operations. | Lustre | Sum | Count |
Directory Create Operations | The total number of directory create operations. | Lustre | Sum | Count |
NFS Bad Calls | The number of calls rejected by the NFS server remote procedure call (RPC) mechanism. | OpenZFS | Sum | Count |
File Server Cache Hit Ratio | For OpenZFS: The percentage of cache hits. For Single-AZ 2 (non-HA and HA) file systems, this metric reports the cache hit ratio for both the in-memory (ARC) and NVMe (L2ARC) caches. For Single-AZ 1 (non-HA and HA) file systems, this metric reports only the cache hit ratio for the ARC cache. For ONTAP: The percentage of all read requests that are served by data in the file system's RAM and NVMe caches. A higher percentage means that more reads are served by the file system's read caches. | OpenZFS and ONTAP | Average | Percentage |
Compression Ratio | The ratio of compressed storage usage to uncompressed storage usage. | OpenZFS | Average | Ratio |
Storage Efficiency Savings | The bytes saved from storage efficiency features (compression, deduplication, and compaction). | ONTAP | Sum | Bytes |
Logical Data Stored | The total amount of logical data stored on the file system, considering both the SSD tier and the capacity pool tier. This metric includes the total logical size of snapshots and FlexClones but does not include storage efficiency savings achieved through compression, compaction, and deduplication. | ONTAP | Sum | Bytes |
Network Sent Bytes | The number of bytes (network I/O) sent by the file system. | ONTAP | Sum | Bytes |
Network Received Bytes | The number of bytes (network I/O) received by the file system. | ONTAP | Sum | Bytes |
Data Read Operation Time | The sum of total time spent within the file system for read operations (network I/O) from clients accessing data in the file system. | ONTAP | Sum | Bytes |
Data Write Operation Time | The sum of total time spent within the file system for fulfilling write operations (network I/O) from clients accessing data in the file system. | ONTAP | Sum | Bytes |
Capacity Pool Read Bytes | The number of bytes read (network I/O) from the file system's capacity pool tier. | ONTAP | Sum | Bytes |
Capacity Pool Write Bytes | The number of bytes written (network I/O) to the file system's capacity pool tier. | ONTAP | Sum | Bytes |
Capacity Pool Read Operations | The number of read operations (network I/O) from the file system's capacity pool tier. This translates to a capacity pool read request. | ONTAP | Sum | Count |
Capacity Pool Write Operations | The number of write operations (network I/O) to the file system from the capacity pool tier. This translates to a write request. | ONTAP | Sum | Count |
Storage Capacity Utilization | The percent utilization of storage capacity for the file system. | All | Average | Percentage |
Storage Used | The total storage capacity used for the file system in GB. | All | Sum | Bytes |
Read Operations | The average data read operation time per data read operation. | ONTAP | Average | Seconds |
Write Operations | The average data write operation time per data write operation. | ONTAP | Average | Seconds |
Metadata Operations | The average time taken per meta data operation. | ONTAP | Average | Seconds |
Capacity Pool Tier | The used physical storage capacity in bytes, specific to the storage tier. This value includes savings from storage-efficiency features, such as data compression and deduplication. With StorageTier as StandardCapacityPool | ONTAP | Average | Bytes |
Primary Tier Capacity | The storage capacity for all data types with storage tier as SSD. | ONTAP | Average | Bytes |
Primary Tier Used | The used physical storage capacity in bytes, specific to the storage tier. This value includes savings from storage-efficiency features, such as data compression and deduplication. With StorageTier as SSD, this metric measures the logical space usage for this volume for your SSD. | ONTAP | Average | Bytes |
Primary Tier Avail | The available or unused physical storage capacity in bytes, specific to the storage tier. | ONTAP | Average | Bytes |
Metadata Operation Time | The total time taken in meta data operation. | ONTAP | Sum | Seconds |
Available Volumes | The number of available volumes. | OpenZFS and ONTAP | Sum | Count |
Failed Volumes | The number of failed volumes. | OpenZFS and ONTAP | Sum | Count |
Misconfigured Volumes | The number of misconfigured volumes. | OpenZFS and ONTAP | Sum | Count |
Created Volumes | The number of created volumes. | OpenZFS and ONTAP | Sum | Count |
Available SVM | The number of available SVM (Support Vector Machine). | ONTAP | Sum | Count |
Failed SVM | The number of failed SVM | ONTAP | Sum | Count |
Misconfigured SVM | The number of misconfigured SVM. | ONTAP | Sum | Count |
Total Volumes | The total number of volumes in the file system. | OpenZFS and ONTAP | Sum | Count |
Total SVM | The total number of storage virtual machines in the file system. | ONTAP | Sum | Count |
No Data Compression OpenZFS Volume | The method used to compress the data on the volume can be NONE | ZSTD | LZ4. This metric shows the number of volumes that use no compression method. | OpenZFS | Sum | Count |
Zstandard (ZSTD) Compression OpenZFS Volume | The number of volumes that use the Zstandard (ZSTD) compression algorithm to compress the data on the volume. | OpenZFS | Sum | Count |
LZ4 Compression OpenZFS Volume | The number of volumes that use the LZ4 compression algorithm to compress the data on the volume. | OpenZFS | Sum | Count |
Clone Volume | The number of volumes that reference the data in the origin snapshot, i.e. that uses the clone strategy when copying data from the snapshot to the new volume. | OpenZFS | Sum | Count |
Full Copy Volume | The number of volumes which copies all data from the snapshot to the new volume i.e. that uses full-copy strategy when copying data from the snapshot to the new volume. | OpenZFS | Sum | Count |
Incremental Copy OpenZFS Volume | The number of volumes that use an incremental copy strategy when copying data from the snapshot to the new volume. This option is only for updating an existing volume by using a snapshot from another FSx for the OpenZFS file system. | OpenZFS | Sum | Count |
Performance metrics for data repository tasks
Attribute | Description | Statistic | Data type |
---|---|---|---|
Succeeded Count | Number of files successfully exported. | Sum | Count |
Failed Count | Number of files that failed to export. | Sum | Count |
Total Count | Total number of files to export. | Sum | Count |
Performance metrics for FSx Storage Virtual Machine
Metrics name | Description | Statistic | Unit |
---|---|---|---|
Total Volumes | Total number of volumes in the SVM. | Sum | Count |
Available Volumes | Number of available volumes. | Sum | Count |
Created Volume | Number of created volumes. | Sum | Count |
Failed Volumes | Number of failed volumes. | Sum | Count |
Misconfigured Volumes | Number of misconfigured volumes. | Sum | Count |
FlexVol Volume | Number of FlexVol style volumes | Sum | Count |
FlexGroup Volume | Number of FlexGroup Volume style volumes. | Sum | Count |
Unix Volume | Number of UNIX type security style volumes. The security style for the volume can be UNIX, NTFS, or MIXED. | Sum | Count |
Ntfs Volume | Number of NTFS type security style volumes. | Sum | Count |
Mixed Volume | Number of MIXED security style volumes. | Sum | Count |
RW (Read/Write) Ontap Volume | Number of RW ONTAP volume type. | Sum | Count |
DP (Data-Protection) Ontap Volume | Number of DP ONTAP volume type. | Sum | Count |
LS (Load-Sharing) Ontap Volume | Number of LS ONTAP volume type. | Sum | Count |
No FlexCache Volume | FlexCache endpoint type of the volume can be NONE, ORIGIN, or CACHE. This metric indicates the number of None FlexCache Endpoint type volumes. | Sum | Count |
Origin FlexCache Volume | Number of Origin FlexCache Endpoint type volumes. | Sum | Count |
FlexCache Volume | Number of Cache FlexCache Endpoint type volumes. | Sum | Count |
Performance metrics for FSx Volume
Metrics name | Description | Statistic | Unit |
---|---|---|---|
Data Read Bytes | The number of bytes (network I/O) read from the volume by clients. | Sum | Bytes |
Data Write Bytes | The number of bytes (network I/O) written to the volume by clients. | Sum | Bytes |
Data Read Operations | The number of read operations (network I/O) on the volume by clients. | Sum | Count |
Data Write Operations | The number of write operations (network I/O) on the volume by clients. | Sum | Count |
Metadata Operations | The number of I/O operations (network I/O) from metadata activities on the volume by clients. | Sum | Count |
Data Read Operation Time | The sum of total time spent within the volume for read operations (network I/O) from clients accessing data in the volume. | Sum | Seconds |
Data Write Operation Time | The sum of total time spent within the volume for fulfilling write operations (network I/O) from clients accessing data in the volume. | Sum | Seconds |
Metadata Operation Time | The sum of total time spent within the volume for fulfilling metadata operations (network I/O) from clients that are accessing data in the volume. | Sum | Seconds |
Capacity Pool Read Bytes | The number of bytes read (network I/O) from the volume's capacity pool tier. | Sum | Bytes |
Capacity Pool Write Bytes | The number of bytes written (network I/O) to the volume's capacity pool tier. | Sum | Bytes |
Capacity Pool Read Operations | The number of read operations (network I/O) from the volume's capacity pool tier. This translates to a capacity pool read request. | Sum | Count |
Capacity Pool Write Operations | The number of write operations (network I/O) to the volume from the capacity pool tier. This translates to a write request. | Sum | Count |
Storage Used | The used logical storage capacity of the volume. | Maximum | Bytes |
Storage Capacity | The size of the volume in bytes. | Maximum | Bytes |
Storage Capacity Utilization | The storage capacity utilization of the volume. | Average | Percent |
Files Used | The used files (number of files or i nodes) on the volume. | Maximum | Count |
Files Capacity | The total number of i nodes that can be created on the volume. | Maximum | Count |
Free Storage Space | The unused or free logical storage capacity of the volume. | Sum | Bytes |
Free Storage % | The percentage of unused logical storage capacity of the volume. | Average | Percent |
Total Throughput | The total throughput of data read and data write bytes. | Average | MB/sec |
Read Throughput | The total throughput of data read bytes. | Average | MB/sec |
Write Throughput | The total throughput of data write bytes. | Average | MB/sec |
Total IOPS | The number of operations (network I/O) on the volume by clients, which includes data read and write operations and metadata operations per sec. | Average | Count/sec |
Read IOPS | The number of read operations (network I/O) on the volume by clients per second. | Average | Count/sec |
Write IOPS | The number of write operations (network I/O) on the volume by clients per second. | Average | Count/sec |
Metadata IOPS | The number of metadata operations (network I/O) on the volume by clients per second. | Average | Count/sec |
User Data | The amount of logical space used, in bytes. This metric measures different types of space consumption depending on the dimensions used with this metric. Here it includes dimensions StorageTier as All and DataType as User. | Average | Bytes |
Snapshot Data | The amount of logical space used, in bytes. This metric measures different types of space consumption depending on the dimensions used with this metric. Here it includes dimensions StorageTier as All and DataType as Snapshot. | Average | Bytes |
Other Data | The amount of logical space used for all StorageTier with DataType as Other, in bytes. | Average | Bytes |
Read Latency | The time taken per Data Read Operation. | Average | Seconds |
Write Latency | The time taken per Data Write Operation. | Average | Seconds |
Metadata Latency | The time taken per Metadata Operation. | Average | Seconds |
Threshold configuration
To configure thresholds for your Amazon FSx monitor:
- Log in to Site24x7 and navigate to Admin > Configuration Profiles > Threshold and Availability.
- Click Add Threshold Profile.
- Select the applicable monitor type from the Monitor Type drop-down menu and provide an appropriate name in the Display Name field. The applicable monitor types are FSx File System, FSx Storage Virtual Machine, and FSx Volume.
- The supported metrics are displayed in the Threshold Configuration section. You can set threshold values for all the metrics mentioned above.
- Click Save.
Forecast
Estimate future values of the following performance metrics and make informed decisions about adding capacity or scaling your AWS infrastructure.
- Data Read Bytes
- Data Write Bytes
- Data Write Operations
- Data Read Operations
- Metadata Operations
IT Automations
You can add automations for the AWS services supported by Site24x7. Log in to Site24x7 and go to Admin > IT Automation Templates (+) > Add Automation Templates. Once automations are added, you can schedule them to be executed one after the other.
You can now create a data repository task or a backup for the file system using Amazon FSx automations.
Licensing
- FSx File System: Each FSx File System monitor is considered a basic monitor.
- FSx Storage Virtual Machine: For the FSx Storage Virtual Machine monitor, five monitors utilize one basic monitor license.
- FSx Volume: Each FSx Volume monitor is considered a basic monitor.
Viewing Amazon FSx monitor data
To monitor your Amazon FSx, log in to Site24x7 and navigate to Cloud > AWS > Amazon FSx.
Site24x7's Amazon FSx monitoring interface
Amazon FSx
Summary
Gain an overview of the different events occurring within each FSx file system with time series charts. This section provides you with operational information on data read operations, data write operations, metadata operations, throughput, read or write bytes, IOPS usage, and more.
Data Repository Tasks
All the metadata related to repository tasks is listed here. This includes information like the task ID, status of the task, life cycle state, failure reason (if any), and time stamps of task creation, start time, and end time. The Action column lets you set up alerts or add an automation in case the data repository task is down.
Backup Details
The backup details carried out for any FSx file system will be listed here. This includes information about the backup, like the time, type, ID, state of the backup life cycle, KMS key ARN, and Active Directory ID. If you want to delete the monitoring setup for a particular backup, just click the delete option next to each backup task.
Outages
The Outages tab shows the history of your file systems’ various states, like down, trouble, critical, or maintenance. It also provides details on the start and end time of an outage, its duration, and comments (if any). You can also manually add an outage and edit or delete the comments in this same section.
Log Report
Here you can view the audit log data for an FSx file system, along with details on the timestamp, status, data read bytes, data write bytes, and data read/write operations.
FSx Storage Virtual Machine SVM
In the Amazon FSx monitor, both Storage Virtual Machines and Volumes tabs will be displayed for the NetApp ONTAP file system type.
Navigate to the Storage Virtual Machines tab and click the desired monitor name to obtain the following FSx Storage Virtual Machine monitor details.
Summary
The Summary tab provides an overview of the events timeline and metrics in the form of charts.
Volumes
The Volumes tab displays the list of Volume monitors associated with the SVMs along with their status and monitor types. You can configure thresholds by clicking the edit button in the Action column of the preferred monitor. Click the monitor name to obtain the FSx Volume monitor details.
Configuration
The Configuration tab displays the configuration details of the FSx Storage Virtual Machine monitor, such as Storage Virtual Machine Name, Storage Virtual Machine ID, File System ID, and Storage Virtual Machine ARN.
Outages
The Outages tab provides details on an outage's start time, end time, duration, and comments (if any).
Inventory
The Inventory tab displays details like the Storage Virtual Machine ID, Region, and Monitor Licensing Category. The Threshold and Availability Profile and the Notification Profile can be set according to the user and viewed from this tab.
Log Report
The Log Report tab offers a consolidated report of each FSx SVM's log status, which can be downloaded as a CSV file.
FSx Volume
In the Amazon FSx monitor, the Volumes tab will be displayed for NetApp ONTAP and OpenZFS file system types. Navigate to the Volumes tab and click the desired monitor name to obtain the following FSx Volume monitor details.
Summary
The Summary tab provides an overview of the events timeline and metrics in the form of charts.
Configuration
The Configuration tab displays the configuration details of the FSx Storage Virtual Machine monitor such as Volume Name, File System ID, and Volume ARN.
Backup Details
The Backup Details tab displays the backup details carried out for the FSx Volume monitor. This includes information about the backup, such as the Time, ID, State of the Backup Life Cycle, and Active Directory ID. If you wish to delete the monitoring setup for a particular backup, click the delete option next to each backup task.
Zia Forecast
The Zia Forecast tab displays the forecast data for the FSx Volume monitor in the form of charts based on historical time series data.
Outages
The Outages tab provides details on an outage's start time, end time, duration, and comments, if any.
Inventory
The Inventory tab displays details like the Storage Virtual Machine ID, Region, and Monitor Licensing Category on the Inventory tab. The Threshold and Availability Profile and the Notification Profile can be set according to the user and viewed from this tab.
Log Report
The Log Report tab offers a consolidated report of each FSx Storage Virtual Machine monitor's log status, which can be downloaded as a CSV file.