Introduction to ControlUp Monitor Clusters in v8
Introduction to ControlUp v8.1
Beginning with ControlUp v8.1, large organizations with many thousands of data sources can be monitored effectively through ControlUp. Support for large organizations is implemented by means of the Monitor Cluster feature, which enables multiple ControlUp Monitors to work together in order to monitor a single organization. Whereas a single Monitor can typically handle about 2,500 data sources (e.g. 2500 VMs with 160 processes per machine or up to 320k processes in total per monitor node) , a cluster of Monitors can handle virtually more than 50,000 data sources and much more.
A data source is any logical resource in your organization that is monitored by ControlUp: physical and virtual machines, hypervisors, XenDesktops, NetScalers, etc.
This article was written referencing the ControlUp hybrid cloud solution. The aspects of the monitors are still valid for the ControlUp On-premises (COP) solution, in which the COP server/application performs the functions which the Cloud performs in the Controlup Hybrid Cloud solution.
The diagram below shows how a large organization having 32,000 data sources in two sites could use multiple ControlUp monitors to monitor the entire organization.
What a ControlUp Monitor Does
A ControlUp Monitor is a Windows service that manages the continuous monitoring of your organization’s data sources. The main tasks performed by a Monitor are:
- Retrieving data from data collectors: The Monitor connects at frequent intervals to each data collector in the organization to gather detailed up-to-date information about the statuses of each of the data sources from which it gathers information.
A data collector is a software that connects to monitored entities and collects data from them. ControlUp Agents are data collectors that run on monitored Windows machines and gather status information from them whenever they are running. For non-Windows data sources, data collectors running on other machines retrieve status information by means of APIs.
- Aggregating collected data : The Monitor organizes collected data from different sources that relate to the same entities (see Associating Related Data Sources below) so that it can be uploaded into Insights. Similar to the real-time console wherein the console it is displayed properly in the grid, the monitor caches the data locally on the monitor machine.
- Processing aggregated data: The Monitor analyzes the aggregated data in order to identify resources under stress and incidents that should trigger notifications or other automated actions. It then activates the relevant triggers and sends the aggregated data and the information it extracted from that data about stress levels and detected incidents to all open ControlUp Consoles.
- Uploading collected data to Insights: In organizations that use ControlUp Insights to store and analyze historical data, the Monitor uploads the aggregated data and associated information to the Insights database. Before it relays the data to Insights, it reduces it to a manageable size (by decreasing the resolution and calculating average values for each data point).
Deploying Multiple Monitors in an Organization
In implementations of ControlUp in which less than the max-supported capacity per a single monitor node, (e.g. less than 320k processes organization-wide), data sources are being monitored, a single ControlUp Monitor is usually able to perform all of the tasks listed above. For larger organizations, multiple Monitors are necessary according to our Sizing Guidelines for ControlUp v8.x.
The exact number of data sources that can be monitored by a single ControlUp Monitor varies from organization to organization, depending on the specific configuration of hardware and software.
When multiple Monitors are added to an organization, they are automatically deployed as a cluster. Each Monitor in the cluster is assigned particular roles it is responsible for filling. Typically, each Monitor is responsible for collecting data from specific data sources and performing a preliminary aggregation of the data it collects. In addition, it may be tasked with completing the aggregation process for all of the data retrieved by all of the Monitors in the cluster, preparing and sending the data to Insights, and/or other functions.
Only one Monitor cluster can be deployed in a single organization.
How Monitor Clusters Are Managed
In a cluster of Monitors, one of the Monitors acts as the Master Monitor. This Monitor is responsible for dividing up all of the organization’s monitoring tasks among the Monitors in the cluster. All of the other Monitors in the cluster are subordinate to the Master.
The Master Monitor decides on-the-fly which Monitors will perform each monitoring task in the organization. It can change the assignments as necessary based on the load each Monitor is handling at that time.
The first Monitor you deploy in your organization will be the monitor which will perform a 'check-in' to our cloud backend and then it is automatically being chosen as the Master.
In general, the role of the master monitor can move between monitors in any site.
Linking Monitors to Sites
Monitors work best when they are at the same location as the data sources they are monitoring because it minimizes latency in the collection of data from those sources.
In order to enable the linkage of Monitors to the data sources at their location, ControlUp v8.1 and above now support the creation of Sites . Each distinct physical location in your organization – e.g., your New York data center and your London data center – should have its own site.
The site should be configured to include all the Monitors, and all the data sources they monitor, that are situated in that location. The Master Monitor will only task Monitors in each site with the job of collecting data from the data sources in that site.
Only one Monitor cluster can be deployed in a single organization, even if the organization has multiple sites. A site can have multiple Monitors.
Planning the Organization’s Monitor Configuration
Ideally, separate Monitors should be set up at each physical site in which a significant number of data sources are located. For example, if your organization has two data centers, in Washington and Paris, and each has about 3,000 data sources, it is best to set up a Monitor in N+1 configuration in each site for High Availability (HA) at each of these locations.
Each Monitor can handle 2,500 VDIs with 160 processes per machine or up to 320k processes in total per monitor node.
The diagram below shows an example of a configuration of monitors for a large organization.
For information about the system requirements of Monitors, see Sizing Guidelines for ControlUp v8.x
Allowing for Backup and High Availability
When a single Monitor is deployed in an organization, High Availability (HA) is achieved by setting up two Monitors to operate as an active/passive HA pair. If the primary Monitor fails, the secondary Monitor automatically takes over its functioning, ensuring that the monitoring process is not interrupted.
High Availability for a single ControlUp Monitor was already supported in ControlUp v. 7.
When a cluster of Monitors is deployed in an organization, HA is implemented by setting up one Monitor more at each site than is required there, given the number of monitored data sources. When all of the Monitors at a site are functioning properly, some of their available resources remain idle. If any of the Monitors at a site fails, the Master Monitor divides up that Monitor’s tasks among the other Monitors running at the site.
In addition, one of the Monitors in each cluster is designated to be the Master’s backup. This is an internal role that the master monitor is dynamically assigning to a different monitor.
When the Master is running, the backup keeps an up-to-date replica of the Master’s state. If the Master Monitor fails, the backup automatically takes over for it.
Associating Related Data Sources
Logical entities in an organization are often related to one another. For example, a monitored hypervisor and all of the Guest OS data of the VMs running on it are all separate logical entities but they are also related to one another. The data presented in the ControlUp Console would be incomplete if it ignored the relationships between logical entities.
In order to enable ControlUp to match data from related data sources, the properties of every monitored data source include an association index . Related logical entities, like hypervisors and their VMs, all have the same association index.
Association indexes enable ControlUp to match data from related data sources even if they are tracked by different Monitors. At each site, one of the Monitors is responsible for coordinating the matching of data from different sources based on their association indexes. This Monitor retrieves all the current activity data for each association index from the other Monitors at the site and merges the information to produce a complete picture of each entity’s status.
Merging of data by association index is only performed per site, and not for the entire organization. Because of this, it is not recommended to assign related data sources to different sites. If, for example, a hypervisor and its VMs are assigned to different sites, it will not be possible to drill down from the hypervisor to its VMs.