Performance Metrics for Monitoring Linux Servers

Monitor and measure critical metrics like CPU, memory, disk utilization, processes, and network traffic of Linux servers from a unified dashboard. Once the Linux agent is successfully installed, log in to the Site24x7 web client and navigate to Server > Server Monitor > Servers > click on the newly added monitor to view their performance metrics.

The Linux agent sends data to the Site24x7 data center using WMI queries. As the agent has to be downloaded and installed in your servers, learn more on how secure the agent is.

Summary
Processes
CPU
Memory
Disks
Network
Plugins
Checks
Syslogs
Tools
Add custom tab
Root cause analysis (RCA) | Performance reports | Server inventory & health dashboards
Licensing

Summary

Get visibility of all the important parameters of your Linux server performance in a single console. The heat map analysis gives you a quick summary on the status and performance of your server over the last seven days.

Click the icon to view the Performance Report for metrics including CPU, memory, disk utilization, and more. You may also view performance data for specific time periods, by choosing the appropriate time period from the drop-down at the top right corner of the page. You can export the performance details as CSV/PDF or send via email.

Load Average

Load average is the average system load over a period of time. Data on load average gives an idea if your physical CPUs are over utilized or under utilized. In case of overload, you may check on any process that’s wasting resources, provide more hardware resources, or move some of the workload to another system.

CPU Utilization

Regular monitoring of CPU usage is critical to analyze the CPU load over a stipulated period of time and overcome performance regressions.

However, not all high CPU usage is critical. The feature to view reports based on the time period proves useful to identify the CPU usage that is really problematic. This will help you to drill down to the actual reason causing the CPU spike. Based on the analysis, you can come up with solutions like upgrading the CPU hardware, adding more CPU’s, or shutting down frivolous services that are hogging these critical resources.

CPU utilization is calculated using the 'top'command. Learn more

CPU Utilization = 100 - idle time

Also, view the top five processes consuming CPU for a particular point of time by hovering on a particular point in the CPU Utilization graph. The top process data is calculated using the Python module 'psutil'.

The performance report for CPU usage can be accessed by clicking the icon beside the title. The report includes:

CPU utilization by cores
Interrupts - Average number of hardware interrupts that the processor is receiving.
Context switches - Rate of switches from one thread to another. Thread switches can occur either inside of a single process or across processes.
CPU metrics - User space time, hardware interrupts time, idle time, software interrupts time, nice time, wait time, steal time.
Average CPU usage (%) per minute.

Note

Navigate to the CPU tab to view more metrics.

Memory Utilization

Monitoring memory usage helps you to identify under used servers and redistribute loads effectively. This helps to detect server overloads before they cause a downtime or data loss.

Memory usage is calculated using the 'free' command. Learn more.

Memory Utilized = ( ( Total - Free ) / Total * 100 )

Also, view the top five processes consuming memory for a particular point of time by hovering on a particular point in the Memory Utilization graph. The top process data is calculated using the Python module 'psutil'.

The performance report for memory usage can be accessed by clicking the icon beside the title. The report includes:

Swap memory utilization
Memory used
Memory break up for free physical and free swap memory.
Memory Pages In - Number of pages read from disk to resolve hard page faults.
Memory Pages Output - Number of pages written to disk to free up space in physical memory.
Memory Page Fault (per second) - A page fault occurs when a process requires code or data that is not in its working set (its space in physical memory). Know how this metric is calculated and shown in the Site24x7 web client.
Average memory usage (%) per minute.

Note

Navigate to the Memory tab to view more metrics.

Memory Breakup

Get a split up of the free physical memory and free swap memory available in the server. The less used files can be moved to the swap space until they are needed and new files can be swapped to the RAM. This helps to better plan and allocate resources to avoid server overload and data loss.

Note

Navigate to the Memory tab to view more metrics.

Open File Descriptors

This metric shows you the number of unique identifiers showing the number of open files and active input/output resources linked to the Operating System of the device. Knowing this critical data ensures optimal resource utilization since open files mean resource consumption like kernel data structures and memory.

Disk Usage with Capacity Plan

Get an idea of what your disk usage will be after seven days, based on your current disk usage. If the disk usage and predicted value are irregular or show a sudden spike, then it means there is some performance degradation issue in the usage and definitive action is required. Go to the Disks tab and check for the disk utilization of each partition and resolve the issue before it affects the overall performance of your server.

Note

Navigate to the Disks tab to view more metrics.

Recent Events

Know the latest events in your server, categorized as Warning, Error, and Information. The data is refreshed after every poll and this helps in getting to know any abnormal increase in the number of error/warning events and take immediate action.

Top Process by CPU and Memory

View a list of top processes based on the CPU or memory usage in your server. Use the toggle button to choose between CPU and memory usage.

Application Details

Applications like docker or plugins that you have installed will be listed along with their respective monitor display names. Click on the monitor name to go to the respective monitor's Summary page.

Down/Trouble History

The complete history of your server's DOWN and TROUBLE status is listed with the duration of the DOWN/TROUBLE period, reason for the outage, and the root cause analysis (RCA) details.

Processes

Monitor the processes running on your Linux server. In case you are unable to find the process that is running on your Linux server, use the Discover Processes option to add them manually.

Find out more about the metrics for process monitoring and the management actions that can be performed. Individual thresholds for every process can be set by using the pencil icon under Action. Learn more.

CPU

Get complete data on the CPU utilization of your server.

Metric Name	Description
User Space Time	The percentage of CPU spent on user processes
Hardware Interrupts Time	The percentage of CPU servicing hardware interrupts
Idle Time	The percentage of CPU spent in idle state
Software Interrupts Time	The percentage of CPU servicing software interrupts
Nice Time	The percentage of CPU processing low priority processes
Wait Time	The percentage of CPU waiting on I/O operations
Steal Time	The time stolen by the Hypervisor host to use it on the other virtual machines
System Time	The percentage of CPU spent on system processes
Interrupts and Context Switches	Average number of hardware interrupts that the processor is receiving and the rate of switches from one thread to another
CPU Utilization by Cores	The CPU utilization for all your central processing units or cores.

Memory

Get complete data on the memory utilization of your server.

Metric Name	Description
Swap Memory Utilization	The total swap space available in the server (in percentage)
Memory Used	Total memory used by the server (in Bytes)
Memory Breakup	A split up of free physical and free swap memory
Memory Pages (In/Out/Fault)	Number of pages read from, written to the disk respectively, and the number of page faults

The metric memory page fault per second is calculated from proc system using the following command:

cat /proc/vmstat

To cross check the value shown in the Site24x7 web client, execute the following command in your terminal. It would give the page fault value since the server was booted. This is shown as a per second value in the Site24x7 web client.

cat /proc/vmstat | grep -i 'pgpgin\|pgpgout\|pgfault'

Disks

Closely monitor the disk usage and have a regular check on the availability of disk space in your servers. Check out the server disk partition report to view the used and free disk space across servers in your account.

Metric Name	Description
Disk Partition Details & Usage Forecasting	A tabular view of the used and free disk space (in MB and percentage). Click on the values to go to a detailed performance report for each partition. Click on the pencil icon under Action to set thresholds for each of these partitions. You can also choose to Skip Alert for any partition using the pencil icon.
Average Disk Utilization (%)	The free and used disk space (in percentage) available in your server
Disk (I/O)	The read and write operations performed in the disk
Partition Disk I/O	The read and write operations performed in every partition
Overall Disk Utilization	The total disk usage and free space available in GB
Current Individual Disk Utilization (%)	The most recent (last polled) utilization of individual disk partitions
Disk Idle and Busy Percentage	Know how much of your disks are being used to avoid overloading. If the busy time is high, then it refers to overload and that allocation of resources in your server is not done optimally. Note: The Linux monitoring agent uses the utility iostat to capture the disk idle and busy percentage. Please ensure the iostat utility is installed in your server. If not, install it in the server and restart the monitoring agent service.
Disk IOPS	Displays the total number of input and output operations performed by the disk per second. Please ensure the iostat utility is installed in your server. If not, install it in the server and restart the monitoring agent service.
Average Disk Queue Length	Represents the average number of input and output (I/O) requests waiting in the command queue for the disk device. This metric is used for assessing disk performance and responsiveness. Please ensure the iostat utility is installed in your server. If not, install it in the server and restart the monitoring agent service.

Under Disk Partition Details & Usage Forecasting, click on the Rediscover button to discover disk partitions and add them for monitoring. Click on the Bulk Action button to set thresholds for multiple disk partitions at one go.

If you wish to specify a threshold value for a particular partition, click on the pencil icon beside the partition's name under Action. Multiple threshold values can be set for a single partition for the conditions >, <, =, >=, <= and in Bytes, KB, MB, GB, and TB. You can choose to get a Trouble or Critical alert when a breach is detected.

Note

Learn more on how alerting works when disk utilization thresholds reach beyond their configured values.

Network

In this tab, you can view the following network statistics:

A graphical representation of Packets Sent, Packets Received, Data Sent, and Data Received.
Overall network details based on connection type, including:
- Network Interface Name
- Maximum interface speed
- Status
- Data Sent
- Data Received
- Bandwidth usage
- Packets Sent
- Packets Received
- Error packets
Detailed information for individual network interfaces, including:
- Network Interface Name
- MAC Address
- IPv4 Address
- IPv6 address

Click on an individual metric (like input or output traffic) under Network to get a graphical representation of the performance data. You can set individual threshold values to the network interfaces for the conditions >, <, =, >=, <= using the pencil icon under Action.

Set thresholds for multiple network interfaces in one go using the Bulk Action button. Click on the Rediscover button to discover network interfaces and add them for monitoring. If you want to have a consolidated report of critical network adapters across servers, check out the network adapter report.

Note

A network interface/adapter will be added for every unique MAC address. If more than one interface has the same MAC address, then only one interface will be added and the rest will be ignored.

Plugins

Customize and monitor data specifically tailored to your needs using Site24x7's plugin integrations. Use our ready-to-install 50+ plugin integrations or write your own plugin using Python or Shell scripts.

Parameter	Description
Plugin Name	Name of the plugin monitor
Status	Tells you if the plugin is in the UP or DOWN state
Version	This is a mandatory field denoting the version number of the plugin. If the user wants to add/modify/delete attributes, the plugin version needs to be changed to create a new template. Know under what conditions the plugin version needs to be changed
Template Name	Name of the plugin template that has the list of attributes to be monitored. Know how to configure a template.
Attributes	The total number of attributes listed under that plugin
Performance Attribute	While setting up the plugin template, decide which attribute has to be listed in the main Summary page, in the log report and in the main plugin monitors listing page. Learn more
Action	You can edit or delete the plugin monitor

Based on the processes running on your server, the agent will pick up relevant plugins and list it under Recommended Plugins for ease of adding a plugin monitor to your account.

Checks

Monitor internal resources like files, directories, URLs, ports, and syslogs on a Linux server. Click on Create/Edit Resource Check Profile to create/edit resource checks. You can also go to the Admin tab in the Site24x7 web client and click on Server Monitor > Resource Check Profile to add a resource for monitoring. The following internal resources are supported for monitoring:

File & Directory monitoring
- Access check
- Permissions check
- Size check
- Last modified check (only for files)
- Content check (only for files)
- Subdirectory availability (only for directories)
- File availability (only for directories)
URL and Port monitoring
Syslog monitoring

Learn more.

Syslogs

Get ample amount of data laid out in a graphical format detailing downtime, performance drops, and security infringements. Detailed metrics on logging program messages and process severity can be extrapolated from the Syslogs graph.

The user can also check for specific keywords and their occurrences in the syslogs. The logs can be filtered by ID and source to get notified instantly when unexpected behavior occurs.

Tools

Manage various actions and carry out tasks at ease and all in one place using Server Tools. You can also access this page by going to Server > Server Monitor > Server Tools > select your Linux server from the drop-down.

I. Process Viewer

Get the complete list of all the active processes running on your Linux server with their CPU (%) usage, memory (%) usage, handle count, thread count, and instances. You can search for any particular process in the Search Bar at the top (highlighted in red in the screenshot below). You can add processes for monitoring by using the +Add option beside the process name (highlighted in blue in the screenshot below).

Add Custom Tab

Create your own tab and monitor the performance metrics you need.

Steps to add a customized view:

Click on the Add Custom Tab button.
Provide a Display Name for identification purposes.
Select the metrics that you wish to view and monitor under this view.
Save your changes.
Click on More > click on the custom dashboard that you created.

Note

You can edit the display name or delete a custom view by going to Edit Custom View.

Root Cause Analysis (RCA)

Every time a downtime is detected, a Root Cause Analysis (RCA) report is triggered and sent to the user based on the alerting contact and medium. The RCA generated for a Linux server monitor provides the actual reason behind the downtime, along with the trace route map to diagnose connectivity issues.

Performance Reports

Log in to Site24x7 and go to Reports > Server Monitor to access performance reports for Linux monitoring. In addition to the common reports available for all monitor types in Site24x7, server monitoring has some exclusive reports on disk usage, network adapter details, agent inventory, and top n reports for CPU, memory, and disk. Learn more.

Server Inventory & Health Dashboards

Get complete view of your entire server environment with our intuitive dashboards.

Inventory Dashboard - Displays a count of all your servers, applications, resource checks, plugins and more.
Health Dashboard - Know the current count and status of all the servers, plugins, and apps in your account.

Licensing

Know what metrics you get for a single Linux server monitor. Learn more.

Add a Linux server monitor
Bulk installation: Chef | Puppet | SaltStack | Ansible | Remote installation using SSH
Service and process monitoring
50+ out-of-the-box plugin integrations
Server monitoring agent architecture
Other OS platforms supported: FreeBSD | Windows | OS X
Troubleshooting Tips

On this page

Summary
Processes
CPU
Memory
Disks
Network
Plugins
Checks
Syslogs
Related Articles

Performance Metrics for Monitoring Linux Servers

Summary

Processes

CPU

Memory

Disks

Network

Plugins

Checks

Syslogs

Tools

Add Custom Tab

Root Cause Analysis (RCA)

Performance Reports

Server Inventory & Health Dashboards

Licensing

Related Articles: