There’s an instructive post on the HipChat blog regarding how their product’s message count has risen from 110 million to over a billion since they were acquired by Atlassian. Central to their scaling strategy to meet data growth was their migration from Lucene to Elasticsearch.
In their words they:
kicked Lucene to the curb in favor of Elasticsearch.
Their reasoning? Elasticsearch’s cluttering allows them to:
add more nodes to our cluster when we need more capacity, so we can handle extra load while concurrently serving requests. Moreover, the ability to have our shards replicated across the cluster means if we ever lose an instance, we can still continue serving requests, reducing the amount of time HipChat Search is offline.
The other two primary pieces of their architecture haven’t changed: CouchDB and Redis. Interestingly, HipChat shards Redis data:
we shard our data over 3 Redis servers, with each server having its own slave.
Sharding in Redis isn’t supported out of the box and requires an application to determine how to partition data across shards. I wonder how HipChat data is partitioned and if their strategy evenly smears data across their shards – like with any sharing strategy, once you’ve sharded on a particular key, it’s extremely difficult to re-shard.
“How HipChat scales to 1 Billion Messages” is a good read – so what are you waiting for? Check it out!