ElasticSearch supports clustering; that is, you can have a series of distinct ElasticSearch instances work in a coordinated manner without much administrative intervention at all. Clustering ElasticSearch instances (or nodes) provides data redundancy as well as data availability.
Best of all, clustering in ElasticSearch, by default, doesn’t require any configuration – nodes discover each other. You can set up a cluster in about 60 seconds. Let me show you how!
First, download and unzip or untar the latest version of ElasticSearch. Next, copy the resultant ElasticSearch install directory (for example, mine is dubbed elasticsearch-0.90.3
) into 3 different directories; for example, I’ve called my directories node-1
, node-2
, and node-3
.
Next, open up three terminal windows and in each, change directories to a sequential node. Start the instance in the node-1
directory like so:
1
|
|
The -f
forces the process to run in the foreground.
By default, ElasticSearch nodes will name themselves if you don’t provide a name (via the elasticsearch.yml
configuration file). Thus, after you start the first instance, you should see something along the lines of:
1
|
|
In the above log output, “Dionysus” is the automatic name chosen by ElasticSearch. Note the part about “new_master” for the cluster.service
.
Next, go into the next terminal window, say node-2
, and start that instance the same way (via the -f
flag). You should see the 2nd instance (in my case, named Caiera
) discover the master:
1
|
|
See the detected_master
statement above? And you should see the master, “Dionysus”, add the 2nd instance, “Caiera”, too (via the first terminal window):
1
|
|
Lastly, start the 3rd instance in the 3rd window.
In this case, my 3rd node is dubbed “Phantom Rider” and it’ll discover the master:
1
|
|
And the master, will in turn, add “Phantom Rider” into the cluster:
1
|
|
Now that you have 3 ElasticSearch instances running, you can run a few RESTful commands to verify your cluster is operational.
First, in a new terminal window, run this command:
1
|
|
You should see a nicely formatting JSON response. It’ll look something like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
Note the cluster_name
is by default elasticsearch – you can change this name as well. Also, by default, the master node will claim the 9200 port, however, you can run that same command against any node (for example, http://localhost:9201/_cluster/nodes?pretty=true
will respond with the same exact response).
You can check the health of a cluster, as well, by running this command:
1
|
|
And you should see another JSON response like:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Nodes a cluster will elect a new master if your master node goes down. For example, if you go into the master node terminal window and control-c the process, you should see the two other nodes quickly recognize the master’s failure and consequently elect one of themselves as the new master:
1 2 |
|
Remember, ElasticSearch will make intelligent defaults for you, however, you should most likely look closely at the various aspects you can configure when it comes to clustering. For example, cluster name and node names are something you should consider implementing.
What’s more, the process of discovery can be configured. For instance, multicast is used for auto-discovery, however, there are other options available, such as unicast, which allows you to specify which nodes can be a part of a cluster (that is, unknown nodes cannot join).
Finally, you can control how nodes operate in a cluster. You can explicitly forbid a node from being a master, for example, and you can also configure nodes to not hold data (thus, those nodes become veritable routers for searches). These options allow you to create an interesting topology that has some intelligent routing built in.
Regardless of how you configure an ElasticSearch cluster, I hope you’ve seen that it couldn’t be any easier. Can you dig it, man?