NosDB is a .NET NoSQL database. This means it has
- A distributed cluster
- A flexible data structure
- And it can be scaled for incorporating big data.
Underneath this simplicity, it is an amalgamation of different components working together to establish an efficient and reliable storage system. In this blog, I present an overview of how the NosDB Cluster works.
NosDB has three fundamental components: the NosDB Cluster, Distributors and NosDB Storage.
The NosDB Cluster
We know that NosDB is a clustered system, but what exactly is a cluster in NosDB? A cluster in NosDB consists of the following components:
- Configuration Cluster
- Database Cluster
1. The Configuration Cluster
The configuration cluster is the centralized place to store and manage cluster configurations and client information in NosDB. It is a middle layer between the distributor & the database cluster and serves as the clients’ source for the latest cluster state information. The clients for a configuration cluster might be the nodes/shards, distributor service or .NET client applications with distributors embedded within. NosDB’s configuration cluster runs as a separate process named “Alachisoft NosDB Configuration Service”.
The NosDB configuration cluster is comprised of two nodes, the primary and the replica. Client operations are served from the primary of the configuration cluster and are replicated to the replica node, ensuring failover tolerant behavior. All the data is replicated synchronously which maintains reliability of the data configuration of the cluster. The user does not have to create the configuration cluster; it is created automatically in the background when the database cluster is created.
2. The Database Cluster
A database cluster in NosDB is a collection of shards. It runs as a separate process named “Alachisoft NosDB Database Service”. Just as the configuration cluster is responsible for storing the configurations of the cluster, the database cluster maintains the user’s data in the form of clustered databases and each database cluster is comprised of shards. To completely understand how NosDB’s database cluster works, the following description of shards and shard activity is important:
- Data partitioning among shards
- Data replication among shards
- Data replication within a shard
Data in NosDB is distributed into shards. Each shard contains one or more server machines hosting primary and/or replica nodes. A shard must contain a single primary node and it can contain multiple replica nodes. These nodes, whether primary or replica, can reside on either the same machine or on dedicated machines. NosDB sharding is built on the horizontal partitioning design principle, where the individual rows in a collection are held separately to shape a scalable database system.
Each NosDB database cluster is comprised of one or more shards allowing partitioning of data. Within a shard multiple replica machines/nodes are referred to as replica sets. NosDB clients are connected to the primary nodes of all the shards via a distributor in the cluster and can also be directed to read directly from the replica sets to reduce load on the primary nodes.
Data Partitioning Among NosDB Shards
When we talk about dividing data into shards, which may be physically distributed, we need a strategy per which the data should be divided amongst the shards. NosDB offers full control of your data distribution via different distribution strategies. NosDB supports the following needs-based distribution strategies that are configurable at the level:
- Hash Based Distribution: NosDB’s hashing algorithm is used to generate a hash based distribution map for the buckets (chunks of data) residing in the database. Data contained in these buckets are assigned a partition key, from which a hash code is generated and the prevalent data is sent to the specific shard per the distribution map.
- Range Base Distribution: In this method, data distribution is based on user defined ranges for each shard. For example, Shard A could be defined to receive keys within the range of 1-1000 and Shard B to receive keys within the range of 1000-5000. The user must follow some rules while defining these ranges, for instance ranges should be continuous and not overlap.
- Non-Sharded Distribution: This distribution strategy allows the user to identify a single shard where the data is to be sent, for the specified collection, regardless of how many shards are available.
More information about distribution strategies can be found in Shard Distribution Strategies. For optimum performance, it is advised to choose a strategy based on the geo-location of the data and the nature of the queries you want to execute.
Data Replication Among NosDB Shards
Using shards is also important for balancing the data load. While balancing data load across shards, NosDB manages this data distribution itself, offering smooth data placement among shards without user intervention. In the scenario of shard joining, or removal, NosDB handles all the data management complexities in the background, providing the client absolute abstraction from the cluster changes. In this case, a ‘state transfer’ is triggered as the data distribution is changed. State transfer amongst shards is the mechanism to transfer of data from one shard to another in data chunks (buckets). State transfer among shards is called Inter-Shard State Transfer.
Data Replication Within a NosDB Shard
As already mentioned, each shard in a NosDB cluster must have at least one node which is the primary. More nodes can be added to form a replica set. A replica set ensures recovery and prevents database downtime using a failover mechanism, thus providing 100% availability.
If a shard contains a single node, it is automatically declared as the primary. New nodes added to the shard first replicate all the data from the primary node. After achieving a stable state, an election mechanism is triggered to decide the status of the newly added node. It all depends on whether:
- The node has up-to-date data
- The new node has adequate connections maintained within the shard and the configuration cluster to ensure ‘no-loss’ connectivity, and
- The priority of the new node added versus the old nodes.
For example, if the newly added node has a priority of 1 (highest) whereas the old node has a priority of 4 (low), the new node is elected as the new primary, provided it has the latest data and has adequate connections within the shard and with the configuration cluster.
In another scenario – the case of a primary node failure – a new primary is chosen among the replica nodes via an election mechanism. Details about the rules and process of electing a primary can be found in Node Election Mechanism.
When a new node(s) joins a shard, or a node recovers from previous failure, a state transfer takes place known as intra-shard state transfer. Intra-Shard State Transfer is the process of copying data from the primary to the replica nodes within the shard. This is distinct from the inter-shard state transfer where data is moved between shards.
Replication from primary nodes to replica nodes happens via the Pull-Model mechanism. Each operation is first performed and logged on the primary node of the shard. Replicas sync to the primary node by pulling all of the new changes.
NosDB’s Distributor is an intermediate layer between the client and the cluster and is responsible for managing client requests on the cluster. When the distributor receives the client’s request, it determines which shard will receive the request, ensuring minimum calls to and from the server. It is necessary that the distributor is running at all times for the client to communicate with the cluster.
The distributor for .Net clients is embedded within the client process for maximum performance. For other NosDB clients such as Java, the distributor is not built in, and is provided as a separate “Distributor” service. This makes it easier to access NosDB via other development languages.
Because of the distributed architecture of NosDB, data may reside in multiple locations, the question arises:
“How does the distributor know where to send a specific client operation?”
The distributor keeps a distribution map of all collections residing on each cluster, which is obtained from the configuration server on their first interaction. Once the distribution map is acquired, client operations become shard specific. When the configuration changes, the distributor fetches the new configuration from the configuration server.
That’s an overview of NosDB cluster architecture and its functions. For more details please visit the NosDB Conceptual Guide on the Alachisoft website. In part 2 of this topic I will talk about the process of NosDB storage and persistence of its data.