How To: Decommission a node from the Elasticsearch cluster

Views:

Applies To:

GroupID 10 and above – Elasticsearch

Business Scenario:

We have GroupID in a master-slave configuration where multiple nodes are available in the Elasticsearch cluster. We want to remove/decommission one of the GroupID servers from the cluster.

Since all servers are interconnected, is there a way to safely remove server(s) without breaking the Elasticsearch cluster or losing any essential data available in Elasticsearch?

Methodology:

By default, data is stored in the form of shards in Elasticsearch. These shards are created once when the cluster is established and connected with the GroupID servers in Elasticsearch.

Normally, the shards and their replicas are evenly distributed among the nodes within a cluster. However, if a node is to be removed/decommissioned (either permanently or temporarily), the shards for that node will become unassigned and the cluster health will change from green to yellow. If any primary shards are assigned to the node that is being removed, there is a chance that the data within the shards allocated to the node (that is to be removed) is lost. Therefore, before decommissioning/removing the node, the cluster needs to be readjusted and the node must be excluded from shard allocation.

Note:

Elasticsearch health must be green before performing the procedure given below. This can be achieved when the correct number of replicas is set within the cluster. The number of replicas is determined on the basis of nodes. The common rule of thumb is: total replicas = N-1, where “N” is the number of data/master-eligible nodes. If the number of replicas is not even, wait for the server to stabilize before restarting or removing the decommissioned shards.

Steps:

Using Google Chrome/Microsoft Edge Chromium, the following two ways can be used to set the Elasticsearch cluster to exclude a node from the shards allocation process, thus allowing us to remove/decommission a node from the cluster.

ElasticVUE extension
Elasticsearch Head extension

Using the ElasticVUE extension:

Download and install the ElasticVUE extension in the internet browser.
Open the extension and provide the credentials to connect to the cluster. By default, both the username and password are set to “admin”.
Once connected, navigate to the REST tab from the top.
Set HTTP Method to PUT and type _cluster/settings in Path.
Type the following cURL in the textbox below HTTP Method and Path:

{
"transient" :{
"cluster.routing.allocation.exclude._ip" : "IP address of the node to be removed"
}
}

IP addresses of the nodes that are to be removed can be added here, separating them with a comma.
Click Send Request. The IP addresses will be added to the exclude list and no further allocation of the shards will occur for those nodes.
Next, restart the Elasticsearch service on the node that needs to be decommissioned. All the primary shards assigned to this node will be reallocated to the remaining cluster.
To make sure that the decommissioned node does not contain any document, use the following HTTP cURL:

http://DecommissionedServerName:9200/_nodes/node-10-GROUPID10-ES-3/stats/indices?pretty

The document count should be zero.
Finally, edit the elasticsearch.yml file for all remaining nodes to remove the HOSTNAME of the decommissioned server in discovery.zen.unicast.hosts.
Save the YML file and restart Elasticsearch on all nodes, starting from the master node.

Using the Elasticsearch Head extension:

Download and install the Elasticsearch Head extension in the internet browser.
Open the extension and provide the credentials to connect to the cluster. By default, both username and password are set to “admin”.
Once connected, navigate to the Any Request tab.
Type _cluster/settings in Path and set PUT in HTTP Method, as shown below.
Type the following cURL in the textbox below HTTP Method and Path:

{
"transient" :{
"cluster.routing.allocation.exclude._ip" : "IP address of the node to be removed"
}
}

Like the ElasticVUE extension, IP addresses of nodes that are to be removed can be added here, separating them with a comma.
Click Request.

Once “Acknowledge”: true is shown, it means that the IP address(es) have been successfully added and the node(s) will not be assigned any new data.

All the primary or required replica shards at this point should be redistributed to the rest of the cluster. If the cluster has evenly distributed (N-1) replicas, no noticeable change will be observed. However, in case of uneven distribution, all primary shards will be assigned to the master node and data node that are not part of the exclusion list.

In the figure below (taken before the execution of the command), notice that the shards are distributed in the cluster. Green squares represent shards while a green square with a bold black border represents a primary shard.

After executing the command, the primary shards are now allocated only to the node-10-GroupID10, which is my primary server.
Follow step 7 onwards in Using the ElasticVUE extension to complete the process.

Keywords: Elasticsearch, GroupID, node, decommission, cluster, master, slave