In this artcile on limitations of Hadoop, we will learn about what is Hadoop and what are the pros and cons of Hadoop. We will see features of Hadoop due to which it is so popular. We will also see drawbacks of Hadoop due to which Apache Spark and Apache Flink came into existence. We will learn about various ways to overcome Hadoop limitations.
Limitations of Hadoop
Various limitations of Hadoop are discussed below in this section along with their solution-
a. Issue with Small Files
Hadoop is not suited for small data. Hadoop distributed file system lacks the ability to efficiently support the random reading of small files because of its high capacity design.
Small files are the major problem in HDFS. A small file is significantly smaller than the HDFS block size (default 128MB). If we are storing these huge numbers of small files, HDFS can’t handle these lots of files, as HDFS was designed to work properly with a small number of large files for storing large data sets rather than a large number of small files. If there are too many small files, then the NameNode will be overloaded since it stores the namespace of HDFS.