Apache Spark Cluster Managers


Apache Spark Cluster Managers

Before we start with learning what is Apache Spark Cluster Managers.

Let us revise the concepts of Apache Spark for beginners

Now, let's understand what is Apache Spark Cluster Managers.


In this article  we are going to learn what Cluster Manager in Spark is. Various types of cluster managers-Spark Standalone cluster, YARN mode, and Spark Mesos.


         1- Introduction to Apache Spark Cluster Managers


Apache Spark is an engine for Big Data processing. One can run Spark on distributed mode on the cluster. In the cluster, there is master and n number of workers. It schedules and divides resource in the host machine which forms the cluster. The prime work of the cluster manager is to divide resources across applications. It works as an external service for acquiring resources on the cluster.

The cluster manager dispatches work for the cluster. Spark supports pluggable cluster management. The cluster manager in Spark handles starting executor processes.

     Apache Spark system supports three types of cluster managers namely-

     a) Standalone Cluster Manager

     b) Hadoop YARN

     c) Apache Mesos

          1.1-  Apache Spark standalone Cluster Manager  

   Standalone mode is a simple cluster manager incorporated with Spark. It makes it easy to setup a cluster  that Spark itself manages and can run on Linux, Windows, or Mac OSX. Often it is the simplest way to  run     Spark application in a clustered environment. Learn, how to install Apache Spark On Standalone Mode.


  • How does Spark Standalone Cluster Works?

 It has masters and number of workers with configured amount of memory and CPU cores. In Spark              standalone cluster mode, Spark allocates resources based on the core. By default, an application will grab all the cores in the cluster.

In standalone cluster manager, Zookeeper quorum recovers the master using standby master. Using the file system, we can achieve the manual recovery of the master. Spark supports authentication with the help of shared secret with entire cluster manager. The user configures each node with a shared secret. For communication protocols, Data encrypts using SSL. But for block transfer, it makes use of data SASL encryption.

To check the application, each Apache Spark application has a Web User Interface. The Web UI provides information of executors, storage usage, running task in the application. In this cluster manager, we have Web UI to view cluster and job statistics. It also has detailed log output for each job. If an application has logged event for its lifetime, Spark Web UI will reconstruct the application’s UI after the application exits.

Read Complete Article

Tags (2)