Help Docs

Health Check for Monitor Groups and Subgroups

Site24x7's Health Check feature assess the availability and health of the Monitor Group or Subgroup and is used to identify any issues or potential problems that could affect the particular operation or workload of the Monitor Group. You can also choose to receive alerts and stay updated on your monitor group's health status.

Use case

The Health Check feature monitors and tracks the overall health and availability of your Monitor Groups and Subgroups.

When you group multiple resources performing one business application in a Monitor Group, monitoring each resource separately and compiling availability data is demanding. Site24x7's Health Check helps you view and analyze the status of each monitor along with its availability, so that you can keep your monitors managed.

Benefits of Health Check

You can leverage these benefits by configuring Health Check for your Monitor Groups or Subgroups:

  • View, and track the status and outages of individual monitors in Monitor Groups.
  • Gain insights on monitors that frequently experience status changes and analyze the issue.
  • Configure thresholds and receive alerts whenever a monitor, or a set of monitors, has status changes.
  • Easily identify the monitor in Trouble, Down, or Critical status, and rectify the issue with the help of a detailed root cause analysis (RCA).
  • Track the availability of your resources and analyze the resource health.
  • Customize the status of the Monitor Group or Subgroup, based on the health check threshold profile.

Supported Health Check metrics 

Metric name Description Unit
 Number of Available Monitors  The number of available monitors.  Count
 Total Number of Monitors  The total number of monitors.  Count
 Number of Monitors Added  The number of monitors added.  Count
 Number of Monitors Removed  The number of monitors removed.  Count
 Availability Percentage  The monitor availability percentage.  Percentage
 Percentage of Down Monitors  The percentage of monitors in Down status.  Percentage
 Percentage of Critical Monitors  The percentage of monitors in Critical status.  Percentage
 Percentage of Trouble Monitors  The percentage of monitors in Trouble status.  Percentage
 Percentage of Monitors Under Maintenance  The percentage of monitors under maintenance.  Percentage
Percentage of Suspended Monitors The percentage of monitors in Suspended status.  Percentage
Percentage of Available Monitors The percentage of monitors in Available status.  Percentage
Total Downtime The total downtime of the monitor.  Minutes
Minimum Downtime The minimum downtime of the monitor.  Minutes
Maximum Downtime The maximum downtime of the monitor.  Minutes
Average Downtime The average downtime of the monitor.  Minutes
Down Events The number of down events.  Events
Trouble Events The number of trouble events.  Events
Critical Events The number of critical events.  Events
Maintenance Events The number of maintenance events.  Events
Suspended Events The number of suspended events.  Events

Supported Subgroups metrics

Metric name Description Unit
 Number of Down Subgroups  The number of subgroups in Down status.  Count
 Number of Critical Subgroups  The number of subgroups in Critical status.  Count
 Number of Trouble Subgroups  The number of subgroups in Down status.  Count
 Number of Up Subgroups  The number of subgroups in Up status.  Count

Threshold configuration

To configure thresholds for your Monitor Groups:

  1. Log in to your Site24x7 account and navigate to Monitor Groups.
  2. Select the Monitor Group.
  3. Click Edit.
  4. Under the Health Check Configuration section, click the + icon next to the Health Check Profile field to add a threshold profile.
    Note
    • To edit the threshold profile, click the pencil icon next to the Health Check Profile field.
    • Mute Alert for Subgroups: When you toggle this option to Yes in the Edit Health Check Profile or Add Health Check Profile page, alerts for subgroups directly under the selected monitor group will be suppressed. For example, consider the Zylker monitor group, which has three subgroups: Zylker_sub1, Zylker_sub2, and Zylker_sub3. Additionally, Zylker_sub2 has two subgroups: Zylker_sub2A and Zylker_sub2B.

      If you mute alerts for the Zylker monitor group, alerts for Zylker_sub1, Zylker_sub2, and Zylker_sub3 subgroups will be suppressed because they fall directly under the Zylker monitor group. Zylker_sub2A and Zylker_sub2B will be unaffected because they don't fall directly under the Zylker monitor group.

      If you mute alerts only for Zylker_sub2, alerts for Zylker_sub2A and Zylker_sub2B subgroups will be suppressed. The rest of the Zylker monitor group, including Zylker_sub1 and Zylker_sub3 subgroups, will be unaffected.
  5. Select Health Check from the Monitor Type drop-down menu.
  6. Enter an appropriate name in the Display Name field.
  7. The supported metrics are displayed in the Threshold Configuration section. You can set threshold values for all the metrics mentioned above.
  8. Click Save.

Sync Monitor Group Status with Health Check Status

The Sync Monitor Group Status with Health Check Status option (in the Add Health Check Profile page or Edit Health Check Profile page) is set to Yes by default. When set to Yes, the Monitor Group status will display the same status as the Health Check status.

When the Sync Monitor Group Status with Health Check Status option is set to No then, the Monitor Group status will be updated based on the monitor count threshold set to decide the Monitor Group or Subgroup status. However, the Health Check status (Healthy, Critical, Down, or Trouble) is updated based on the Health Check configuration.

Notify for Count Based Thresholds

The Notify for Count Based Thresholds (in the Add Health Check Profile page or Edit Health Check Profile page) option is set to Yes by default to notify you about the Health Check and Monitor Group statuses. The functionality of this toggle option is also determined by the Sync Monitor Group Status with Health Check Status option.

Note
  • For existing Monitor Groups, both the Sync Monitor Group Status with Health Check Status and Notify for Count Based Thresholds options will be set to No by default. If required, you need to set it to Yes or create a new threshold profile.
  • For new Monitor Groups, both the Sync Monitor Group Status with Health Check Status and Notify for Count Based Thresholds options will be set to Yes by default, only if a new threshold profile is associated with the Monitor Group.

Given below in the table are various scenarios and their corresponding expected outcomes based on the combination of Sync Monitor Group Status with Health Check Status and Notify for Count Based Thresholds option configuration settings.

Scenario When Sync Monitor Group Status with Health Check Status is toggled to When Notify for Count Based Thresholds is toggled to Result
 1

Yes

Yes/No

The Monitor Group status gets synced with the Health Check status.

 2

No

Yes

The Monitor Group status will not be synced with the Health Check status.

However, the Health Check status will be updated based on the Monitor count threshold to decide the monitor group's status.

 3

No

No

Monitor Group status is updated based on the Monitor count threshold to decide the monitor group's status.

The Health Check status is updated based on the Health Check threshold configuration.

Licensing

Health Check is supported for all paid and evaluation accounts.

Polling frequency

Monitor Group

In a Monitor Group when there are monitors in Down, Trouble, or Critical status, the least available poll interval of the monitor out of all those problematic monitors is set as the default polling frequency for Health Check.

Note
The default polling frequency will change only when the problematic monitor remains in Down, Trouble, or Critical status for an hour.

For instance, consider that you have a Monitor Group. Suppose there are four monitors in the Monitor Group namely, zylker 1, zylker 2, zylker 3 and zylker 4 which are in Up, Down, Trouble, Critical status respectively. Let's assume that the poll interval of the monitors are one minute, three minutes, five minutes, and ten minutes respectively. In this case, zylker 2 has a poll interval of three minutes which is the least out of all the problematic monitors. Therefore, the default poll frequency of the Monitor Group will be set as three minutes.

Subgroup

In the case of Subgroups, the least available poll interval of the monitor out of all the problematic monitors in a subgroup is set as the default polling frequency for Health Check.

Note
The default polling frequency will change only when the problematic monitor remains in Down, Trouble, or Critical status for an hour.

For instance, consider that you have a Subgroup that consists of two monitors, zylker-subgroup 1, zylker-subgroup 2 which are in Down and Critical status respectively. Let's assume that the poll interval of the monitors are five minutes and three minutes. In this case, zylker-subgroup 2 has a polling frequency of three minutes which is the least among all. Therefore, the default poll frequency of the Subgroup will be set as three minutes.

Understanding maintenance periods in Health Check

The maintenance period allows administrators to perform necessary maintenance activities on monitors or monitor groups without triggering unnecessary alerts. Maintenance periods can be scheduled in advance or manually triggered, and both types ensure that the monitors are marked as being in maintenance mode.

When you mark a monitor in a monitor group for maintenance, the monitor status gets reflected instantly at the monitor level. However, in Health Check, the monitor maintenance status will be reflected only during the next polling cycle.

Let's consider a monitor group named Zylker Monitor Group with four monitors: zylker 1, zylker 2, zylker 3, and zylker 4. Say zylker 1 experiences an outage from 9:30am - 1pm. 

At 10am, you mark monitor zylker 1 as under maintenance. This means that: 

  • At the zylker 1 level:
    • The whole outage time from 9:30am - 1pm will be marked as maintenance.
  • At the Health Check level: 
    • The outage time will be 9:30am  - 10am.
    • 10am - 1pm will be considered a maintenance period. 

Thus, in Health Check, when you mark a monitor in a monitor group as under maintenance:

  • The monitor status is updated in the next poll.
  • The maintenance period will be displayed along with the outage details without altering the recorded outage duration (i.e., 9:30am - 10am in the above scenario).

Viewing Health Check data for Monitor Groups

To view the Health Check data for Monitor Groups:

  1. Select Home > Monitor Groups from the left pane.
  2. Select the monitor group of your preference and navigate to the Health Check tab.

The top banner in the Health Check tab gives an overview of the Monitor Group. It displays the Monitor Group status, Availability percentage, total number of monitors in the Monitor Group, and the number of monitors in Up, Down, Critical, and Trouble status.

Viewing Health Check data for Subgroups

To view the Health Check data for Subgroups:

  1. Select Home > Monitor Groups from the left pane.
  2. Select the Show Subgroups check box at the top right corner.
  3. Select the subgroup of your preference and navigate to the Health Check tab.

The top banner in the Health Check tab displays the Availability, total number of monitors in the Subgroup, and the number of monitors in Down, Critical, and Trouble, and Up status.

Health Check data

Health check will be automatically enabled when you create a Monitor Group or Subgroups. You can view the Health Check data in the following tabs:

Availability

The Availability tab displays the availability of the Health Check monitors, events based on status, and Availability Status in Percentage/Count of the monitors attached. The various statuses in the section include Up, Down, Critical, Trouble, and Maintenance.

Status

The Status tab shows a detailed status of the monitor such as Total Downtime, Downtime Details, Total Trouble Time, and Trouble Time Details. You can use the status filter option to view the monitors in the Down/Critical/Trouble status.

Monitored Resources

The Monitored Resources tab lists all the monitors mapped under the respective Health Check monitor along with their Status, Outage Start Time, and the Reason for the outage.

You can set the thresholds for the monitors by clicking Threshold Configuration. If the status of an individual monitor changes, then the Monitor Group status is also updated based on the threshold configuration.

Subgroups

All the monitors under the parent Monitor Group along with the associated monitors in the Subgroups are displayed in the Subgroups tab. The Subgroups tab will be displayed only if subgroups are available for the Monitor Group. You can view the subgroup status and identify the number of monitors in Up, Down, Critical, and Trouble status.

Events

You can view the outage details of your monitor such as Start Time to End Time of the outage, Duration, and the Reason of the outage in the Events tab.

Log Report

The Log Report tab shows the monitoring location, time, status, and the monitors in Available, Down, Trouble, or Critical status over a period of time.

RCA

View the downtime summary details along with other data like Status Events, Monitored Resources, and Outage History in the RCA tab.

Was this document helpful?

Would you like to help us improve our documents? Tell us what you think we could do better.


We're sorry to hear that you're not satisfied with the document. We'd love to learn what we could do to improve the experience.


Thanks for taking the time to share your feedback. We'll use your feedback to improve our online help resources.

Shortlink has been copied!