Best Alternative to Hadoop

There are many alternatives to Hadoop, but the others are far behind. Hadoop is the undisputed leader in Big Data.

The most promising alternative to Hadoop is Spark

Spark(http://www.spark-project.org/) is one more open source system developed at the UC Berkeley AMP Lab. Users include UCB, Conviva, Klout, and Quantifind, among others.

The claim is, it runs 100x times faster than Hadoop in scenarios like iterative algorithms and interactive data mining . Spark is also used for data processing.  Spark is also the engine behind Shark, a fully Apache Hive-compatible data warehousing system that can run 100x faster than Hive. A comparison of the performance of logistic regression using Hadoop MapReduce and Spark is shown in the below figure (advertised).

Spark benefits from in memory compared to hadoop’s disk based. It can cache datasets in memory to speed up reuse.

There might not be one solution fit all kind of framework and therefore, its wise to evaluate other related distributed frameworks like Spark which could help in  achieving solution to specialized kind of scenarios/problem. Compatibility with hadoop is a plus.

[1] http://spark-project.org/

Leave a Reply

Your email address will not be published. Required fields are marked *