About the Course
This Spark training will enable learners to understand how Spark executes in-memory data processing and runs much faster than Hadoop MapReduce. Learners will master Scala programming and will get trained on different APIs which Spark offers such as Spark Streaming, Spark SQL, Spark RDD, Spark MLlib and Spark GraphX. This Edureka course is an integral part of Big Data developer's learning path.
After completing the Apache Spark training, you will be able to:
- Understand Scala and its implementation
- Master the concepts of Traits and OOPS in Scala programming
- Install Spark and implement Spark operations on Spark Shell
- Understand the role of Spark RDD
- Implement Spark applications on YARN (Hadoop)
- Learn Spark Streaming API
- Implement machine learning algorithms in Spark MLlib API
- Analyze Hive and Spark SQL architecture
- Understand Spark GraphX API and implement graph algorithms
- Implement Broadcast variable and Accumulators for performance tuning
Who should go for this Course?
This course is a must for anyone who aspires to embark into the field of big data and keep abreast of the latest developments in the fast and efficient processing of ever-growing data using Spark and related projects. The course is ideal for:
- Big Data enthusiasts
- Software Architects, Engineers, and Developers
- Data Scientists and Analytics professionals
What are the pre-requisites for this Course?
A basic understanding of functional programming and object-oriented programming will help. Knowledge of Scala will definitely be a plus but is not mandatory.
Project #1: Design a system to replay the real-time replay of transactions in HDFS using Spark.
- Spark Streaming
- Kafka (for messaging)
- HDFS (for storage)
- Core Spark API (for aggregation)
Project #2: Drop-page of signal during Roaming
Industry: Telecom Industry
Problem Statement: You will be given a CDR (Call Details Record) file, you need to find out top 10 customers facing frequent call drops in Roaming. This is a very important report which telecom companies use to prevent customer churn out, by calling them back and at the same time contacting their roaming partners to improve the connectivity issues in specific areas.
Why learn Apache Spark?
In this era of ever-growing data, the need for analyzing it for meaningful business insights is paramount. There are different big data processing alternatives like Hadoop, Spark, Storm and many more. Spark, however, is unique in providing batch as well as streaming capabilities, thus making it a preferred choice for lightening fast big data analysis platforms.
The following Edureka blogs will help you understand the significance of Spark training:
Online Classes: 24 Hrs
8 live classes of 3 hrs each by Industry practitioners.
Assignments: 32 Hrs
Personal assistance/installation guides for setting up the required environment for Assignments / Projects.
Project: 20 Hrs
Towards the end of the course, you will be working on a project where you are expected to implement the techniques learned during the course to analyse data.
Lifetime access to the learning management system including Class recordings, presentations, sample code, and projects.
24 x 7 Support
Lifetime access to the support team (available 24/7) in resolving queries during and after the course completion
Once you are successfully through the project (Reviewed by an edureka expert), you will be awarded with edureka’s Apache Spark certificate.
This school offers programs in:
Last updated February 13, 2018