Comparison Matrix: Real time data processing systems
There are several tools/framework available that help process data as it arrives. I had done a comparative study of below four systems in the past:
- Apache Kafka
- Facebook Scribe
- Cloudera Flume
- Apache Chukwa
Kafka | Scribe | Flume | Chukwa | |
Current Version | 0.61 | 2.2? | 0.9.41,2 | 0.41 |
Site & Docs | Average | Very Poor | Good | Poor |
Topology | P2P | Master/Slave3 | Master/Slave3, 4 | P2P |
Central Node Management | No | No | Yes | No |
Configurable Level of Reliability | No | No | Yes | No |
Installation | Easy | Many Dependencies | Fairly Easy | Fairly Easy |
Zookeeper Integration | Yes | No | Yes | No |
Configuration | Manual. | Manual. | Centralised, dynamic configuration. | Manual. Needs Agents, Collectors and HICC configurations, Tomcat and Mysql database for web UI. |
Hadoop Integration | Possible to Store data in HDFS | Possible to Store data in HDFS | Possible to Store data in HDFS | High, Needs a Hadoop Cluster to operate! |
Cenralised Liveness Monitoring | No | No | Yes | No |
Language Support | Java | Many | Java, Shell Scripts? | Java, Shell Scripts |
Output Bucketing | Yes, Custom bucketing | Yes, Custom bucketing. | Yes, Custom bucketing with default time and ip based bucketing | Yes, seems manual nothing inherent in the framework. |
In-Flight Transformations | Yes | Yes | Yes | Yes (write map-reduces on collected sink files, even the documentation is not too optimistic about this fetaure. ) |
Transactional Guarantees | High | Adjustable$ | ||
Data Storage^ | Disk | Disk | ||
Data Flow | Pull (consumners pull from Producers) | Push (producers push to consumers) | Push (producers push to consumers) | Push (producers push to consumers) |
P2P = No single point of failure
Master/Slave = Single point of failure
Each of the systems above have at least 2 (or more) components in it. on a high level each one has a message producers and consumers. Both of which is a cluster of machines. There is an additional layer of machines, controllers present in Master-Slave setups.