Monitoring NoSQL Performance in Production

NosDB is a schema-less, scalable NoSQL database solution for .NET that accommodates immense volumes of unstructured data. Because NosDB provides availability and fault tolerance through sharding and distribution strategies, it concurrently generates a need to monitor the impact of operations on your system.

Hence, once NosDB has been deployed and databases are created, IT administrators need to continuously monitor the environment to diagnose and prevent any problems from aggravating in the future, like network choking, peaks in resource utilization and unauthorized access attempts. In addition to these problems, you will want to monitor NosDB to assess general status of load, node operation and cluster health.

NoSQL monitoring tools for NosDB meet all these needs. Before we proceed to discuss the three monitoring tools, take a peek into the categories of server and system counters in NosDB Counters. These counters provide detailed insights into the way your cluster is behaving.

3 NoSQL Monitoring Tools for NosDB

Considering the crucial need for robust monitoring, and to make it a simple experience, NosDB ships three tools to monitor the database and all distributed machines of the cluster from one location, in one window:

  • NosDB Management Studio

NosDB Management Studio is an interactive tool to manage clusters, shards and nodes. Along with its administrative nature, NosDB Management Studio provides a built-in monitor to examine operational performance in each shard and cluster simultaneously. For more detail, check out NosDB Management Studio Statistics.

  • NosDB Monitor

NosDB provides a dedicated tool with high user control, enabling 24/7, real-time production monitoring. NosDB Monitor presents performance counters and cluster health in detail including shards, nodes, connection status, number of clients and broken links.

Moreover, you can customize dashboards according to relevance of your production needs. Further deeper analysis is provided by the integration of Event Logs into NosDB Monitor, which helps identifying any error directly from the Monitor. Please refer to NosDB Monitor for more detail.

  • Logging

NosDB’s detailed logging mechanism creates dedicated logs of all tools and components providing a 360 degree view of any problem. Refer to NosDB Logs for detail. NosDB also provides logging in the Window Event Viewer which simplifies event logging for those familiar with this tool.

NosDB also ships NosDB Log Viewer, a tool built entirely to enhance your logging experience. It categorizes the log entries into separate fields and lets you search entries by logging level, node IP, timeframe or any parameter. Please refer to the documentation on NosDB Log Viewer for more information.

NoSQL Performance Monitoring and Diagnosis – Example

Let us take a scenario where you notice that the NosDB database (two shards, containing a primary and replica node each) is not responding at the ideal rate: there is a lag in the overall response time in most of the responses, and time outs in others. For any IT administrator, this needs to start ringing an alarm as it could eventually result in loss of business and further degrade performance and user-experience.

To avoid impacting the business as a consequence of unresponsive database, we need to perform an in-depth diagnosis to identify the cause of the problem and the solution. We start by observing the performance of CPU and Network statistics in NosDB Monitor, which gives you a big picture of what’s happening within your system.  The red and blue lines indicate the two primary nodes for CPU Usage (on left) and Network Usage (on right) in the red highlight box.

server-dashboard-1

  • CPU Usage

An unresponsive application naturally points towards determining the performance of the CPU on the main machine. You can note the counters for NosDB CPU Usage on a shard node through the NosDB Monitor dashboard (see the red highlighted box above). Since the CPU is consistently running above 60%, you can tell that the node is being overburdened with excessive requests.

  • Network Connectivity

Another reason for application downtime in a clustered environment can be network bottleneck. You can check for any irregular peaks in the Network Usage graph in NosDB Monitor (again in the red highlight box above). The constant high percentage means that the network bandwidth is being compromised; go through the CS and DB logs to unearth the main cause, which could be an unusual amount of MapReduce tasks, queries or any other process.

You now drill deeper by asking the next question: is only one primary is being compromised or is the behavior prevalent across the cluster? For a NOSQL database, the volume of data can be exponential, thus the nodes need to be running with enough capacity.

Using NosDB Management Studio Statistics, you can view multiple nodes and their performance. If you notice a similar pattern in the CPU and Network Usage of the other primary node, check the behavior of the replica nodes. This can be done through Tools -> Options -> check Show secondary nodes counters. A considerable difference between the primary and replica behavior is indicative of all operations being transacted on the primary node, which is overworked and exceeding its capacity.

So what kind of operations are over-exerting the primary nodes? Let’s start with the most basic counters. The Fetches/sec counter is displaying unnaturally high volumes which are neither increasing nor decreasing, while the Inserts, Updates and Deletes are steady and low.

monitor-1

This leads to our second discovery – the application is read-extensive (Fetches/sec) and has very low write volume (i.e. Updates/sec). And since the traffic is only active on the primary nodes, performance will eventually start lagging at a certain threshold.

Being an IT administrator, you might find yourself at a fork at this stage. You can either scale your infrastructure to accommodate the influx of more users, or divert the read-only load towards the replica nodes to take some strain off of the primary nodes.

While scaling is an easy option, a much more feasible approach in this scenario would be distributing the read load among the replica nodes in NosDB. Balancing the load between nodes is as simple as changing the Read Preference of your application. Refer to Get Documents to understand how documents can be fetched from replica nodes along with primary in various fashions. Discussing this with the developers can lead to the application being optimized for efficient load balancing.

Flow

In light of the significance of monitoring, this example shows that NosDB provides a complete package to investigate a problem and find a solution. Monitoring is basically asking a series of questions which can be answered using the three built-in monitoring tools in parallel: NosDB Management Studio, NosDB Monitor and Logs.

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *