LImitations of Hadoop- How to overcome Hadoop Drawbacks.

Hadoop
Highlighted
Enthusiast

LImitations of Hadoop- How to overcome Hadoop Drawbacks.

In this artcile on limitations of Hadoop, we will learn about what is Hadoop and what are the pros and cons of Hadoop. We will see features of Hadoop due to which it is so popular. We will also see drawbacks of Hadoop due to which Apache Spark and Apache Flink came into existence. We will learn about various ways to overcome Hadoop limitations.

Limitations of Hadoop

Limitations of Hadoop

Various limitations of Hadoop are discussed below in this section along with their solution-

a. Issue with Small Files

Hadoop is not suited for small data. Hadoop distributed file system lacks the ability to efficiently support the random reading of small files because of its high capacity design.

Small files are the major problem in HDFS. A small file is significantly smaller than the HDFS block size (default 128MB). If we are storing these huge numbers of small files, HDFS can’t handle these lots of files, as HDFS was designed to work properly with a small number of large files for storing large data sets rather than a large number of small files. If there are too many small files, then the NameNode will be overloaded since it stores the namespace of HDFS.

Solution

  • Solution to deal with small file issue is simple merge the small files to create bigger files and then copy bigger files to HDFS.
  • HAR files (Hadoop Archives) were introduced to reduce the problem of lots files putting pressure on the namenode’s memory. By building a layered filesystem on the top of HDFS, HAR files works. Using Hadoop archive command, HAR files are created, which runs a MapReduce job to pack the files being archived into a small number of HDFS files. Reading through files in a HAR is not more efficient than reading through files in HDFS. Since each HAR file access requires two index files read as well the data file to read, this makes it slower.

Read Complete Article>>