Big Data Analysis with Hadoop


Big Data is a technology that capable of handling a massive and complex data, stream of data in (near) real time and works extremely in large infrastructure. Then, Hadoop is a scalable framework for storing and processing big data into running application.

Created by Doug Cutting and Mike Cafarella. Yahoo is the first company who has the biggest node in Hadoop Environment.

The differentiation between Hadoop 1.0 and Hadoop 2.0

In Hadoop 1.0, MapReduce works as a cluster resource management and data processing. It means that the big data are processed only by MapReduce and the output are going to be send into HDFS as a storage. HDFS works as a redundant and reliable storage for the output that being processed by MapReduce.

Hadoop 1.0 vs Hadoop 2.0

In Hadoop 2.0, YARN are designed to help MapReduce to manage the cluster resource. In this version, Hadoop are concern into improving their performance to process the big data.

The real-life implementation of Hadoop

Based on the note-speaker experience when he was in Singapore to complete Hadoop training, he met other Hadoop users and they had a conversation about Singapore government story. Singapore government use Hadoop to taking care of plants and lamp in every public park on Singapore. The lamp as a sign to every person inside the park to make a distance from plants for watering the plants. Hadoop can manage this kind of activity by analyzing and handling a timetable for every lamp to start giving a sign so the watering process can run well. This is an interesting fact that being shared by Firman Gautama as a first speaker last night.