Apache Spark is an open source cluster-computing framework that serves as an engine for processing big data within Hadoop. Spark has become one of the key big data distributed processing frameworks, and can be deployed in a variety of ways. It provides native bindings for the Java, Scala, Python (especially the Anaconda Python distro), and R programming languages (R is especially well suited for big data), and it supports SQL, streaming data, machine learning, and graph processing.
Apache Spark natively supports Java, Scala, R, and Python, giving you a variety of languages for building your applications.Apache Spark comes with the ability to run multiple workloads, including interactive queries, real-time analytics, machine learning, and graph processing. One application can combine multiple workloads seamlessly.
Fast - It provides high performance for both batch & streaming data, using a state-of-the-art DAG scheduler, a query optimizer, & physical execution engine.
Runs Everywhere - It can easily run on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud.
Easy to Use - It facilitates to write the application in Java, Scala, Python, R, and SQL. It also provides more than 80 high-level operators.
Lightweight - It is a light unified analytics engine which is used for large scale data processing.