HADOOP & CLOUD COMPUTING PRACTICAL QA

A huge volume of data is considered as Big Data.

Apache Hadoop was born out as a solution to Big Data.
Hadoop was created by Doug Cutting and Mike Cafarella.
Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming model.

2. What is Big data ?

Ans: Big data is a collection of data sets so large and complex that your legacy IT systems cannot handle them.

Now, the question arises what is considered as huge?

Many terabytes, petabytes, exabytes of data.

Now, the other question is how we can decide the data is Big Data or not? How can we say the problem needs a Big Data solution or not?

If the problem satisfies the three factors.

The four factors are:

3. What are the four parts of the Big data ?

Ans: We can describe the Big Data challenges into four Parts.

Storage - We need to know how to store the huge data efficiently.
Computational Efficiency - Efficiency in storing the data that is suitable for computation.
Data Loss - Due to disruption, hardware failure. We need to have a proper recovery strategy in place.
Cost - The solution that we propose should be cost-effective.

4. Why traditional database is not a solution?

Ans:

It is not horizontally scalable because we cannot add resources or more computational node.
A database is designed for structured data. This database is not a good choice when we have a variety of data.

It support huge volume of data.
It stores the data efficiently.
Data Loss is unavoidable. The proposed solution gives good recovery strategies.
The solution should be horizontally scalable as the data grows.
It should be cost-effective.
Minimize the learning code. It should be easy for the programmer and non-programmers.

Ans: the characteristics of Hadoop.

High Availability - The data is highly available/accessible despite hardware failure. It keeps multiple copies of data.
Scalability - It provides horizontal scalability.
Fault tolerant - By default, three replicas of each block is stored across the cluster in Hadoop.
Economic - It is not so expensive as it runs on a cluster of commodity hardware.

By using Hadoop, even the large enterprise can efficiently manage their data.

Let's discuss some organization which uses Hadoop.

Amazon- It provides easy to use analytic platform build around the powerful framework.
IBM - It enhances this open source technology to withstand the demands of enterprise adding administrative discovery, development, provisioning, security features.

Many organizations use Hadoop to solve their Big Data problems like Cloudera, MaPR Technologies, and DATASTAX.

Note: For more QA please click on below link:

Url: https://payhip.com/linuxkuriosity

Linux Kuriosity