Apache Whirr: ElasticSearch Cluster Setup on Amazon EC2

The last time i tried to create a cluster, i had several problems. I was creating a hadoop cluster on amazon ec2. Taking the learnings forward, i decided to create a elasticsearch cluster on ec2 with apache whirr. The idea of launching several machines with one command is really enticing and its wonderful if it works. With whirr we start with a configuration file and say launch-cluster. Here is the initial configuration file i started with:

#Change the cluster name here 
#Change the name of cluster admin user 
#Setup your cloud credentials by copying conf/credentials.sample 
#to ~/.whirr/credentials and editing as needed 
#Change the number of machines in the cluster here 
whirr.instance-templates=2 elasticsearch  
#By default use the user system SSH keys. Override them here. whirr.private-key-file=${sys:user.home}/.ssh/whirr_id_rsa

and thats it. It does the magic. It launches 2 elasticsearch machines

/bin/whirr launch-cluster -config elasticsearch.properties

It can’t be that easy, right?, you say. No its not. As always and as expected some things are not right. What would be life if everything worked perfectly with one command? The problem was: There are two machine launched in EC2 but they do not form a cluster. They both are elasticsearch master nodes! Surely, we are missing some configuration. But wouldn’t one expect whirr scripts do the right configuration? Yes we would, but it doesn’t configure the cluster properly. Before going ahead with doubting the configuration i checked if all required ports i.e. 9200 and 9300 are open for communication. They were open.

The missing configuration is something called elasticsearch discovery module. For a cluster to work properly, it needs to find other machines in the cluster. EC2 discovery is something that needs to happen on a EC2 cluster. My obvious choice was to increase ping_timeout, and i did change it to 30 secs in elasticsearch’s config/elasticsearch.yml file:

discovery.type: ec2  
discovery.ec2.ping.timeout: 30s  

Stop and start elasticsearch services on both the instances to see if works. It does not. It never finds other machines to talk to and declare themselves as master nodes. After poking around in logs and the internets, found below settings that would make it work:

cloud.aws.region: eu-west-1  
discovery.ec2.host_type: public_dns  

In theory, we would not need to do the above as we had already declared discovery type to be EC2 and this means that behind the scenes AWS api is used to find out other machines. But as it has turned out, as of writing this, we need above setting to get it going. Another restart of the services on both instances and it worked. How do we verify that? Install a plugin named elasticsearch-head. Install it with

sudo elasticsearch/bin/plugin -install mobz/elasticsearch-head  

Now create a ssh tunnel on 9200 an hit the below url:


On the webpage that comes up, one would see two machins listed and at the top

cluster health: green (2, 0)  

The first integer is number of machines in the cluster and the second is number of shads which is 0 as we do not have any indexes yet.

Why use whirr when we need these manual changes? Indeed, lets change some whirr scripts to make these changes automatic. Whirr provides a way to override its script by creating a functions folder in the whirr installation directory.

mkdir whirr-0.8.1/functions  
cp whirr-0.8.1/services/elasticsearch/src/main/java/resources/functions/ whirr-0.8.1/functions/  

Now, modify configure_elasticseach.sh file in the functions directory. Add this line after the line where the script copies elasticsearch.yml file.

sed '$ acloud.aws.region: eu-west-1ndiscovery.ec2.host_type: public_dnsn' config/elasticsearch.yml  

Running launch cluster command now would put this extra config into the yml file on all machines.