There is a shortage of qualified Big Data Engineers in the workforce, and individuals with these skills are in high demand. Build skills to take you into the next era of data engineering. Build real-time applications to process big data at scale, and launch a career in Big data Engineer.


  • Basic skills with at least one programming language are desirable – optional
  • Familiar with the basic math and statistic concepts – optional
  • Certified Data Scientist Professional CDSP – optional


Training Program Description:

  • the capability of collecting and storing huge amounts of versatile data necessitates the development and use of new techniques and methodologies for processing and analyzing big data. this Program provides comprehensive coverage of a number of technologies that are at the foundation of the big data movement. the Hadoop architecture and ecosystem of tools


  • Gain Big data is a big buzz word, and everyone seems to be talking about it, but what exactly is big data? Where is this data coming from, how is it being processed, and how are the results being used? The program introduces one of the most common frameworks that has made big data analysis easier and more accessible — increasing the potential for data to transform our world!
  • develop on their combined knowledge of Big Data technologies (e.g. Hadoop, Spark, etc.) and Data Science (e.g. Statistics, Machine Learning, etc.) and understand how such combination is used to solve real-world applications. In addition to this main goal, the program has the additional goal of familiarizing trainees with the latest technological and scientific trends in the field and how Big Data and data science are used in modern business enterprises. Use cases of real problems such as Telecommunication Analytics, Working with Social Graphs, and Design a system to report web sessions per day…etc. will be addressed in this program.
  • Throughout this program, you will practice your Big Data skills through a series of hands-on labs, assignments, and projects inspired by real-world problems and data sets from the industry. You will also complete the program by preparing a Big Data capstone project that will showcase your applied skills to prospective employers.



  • This program is comprised of many career-oriented projects. Each project you build will be an opportunity to demonstrate what you have learned in the lessons. Your completed projects will become part of a career portfolio that will demonstrate to potential employers that you have skills in Big Data
  • One of our main goals at EAII is to help you create a job-ready portfolio of completed projects. Building a project is one of the best ways to test the skills you have acquired and to demonstrate your newfound abilities to future employers or colleagues. Throughout this program, you will have the opportunity to prove your skills by building the following projects
  • Building a project is one of the best ways both to test the skills you have acquired and to demonstrate your newfound abilities to future employers. Throughout this program, you will have the opportunity to prove your skills by building the following projects:


  • Project 1:  Working with Samples
  • Project 2:  Telecommunication Analytics
  • Project 3:  Working with Social Graphs
  • Project 4:  consume web server logs and keep track of top sellers
  • Project 5:  serving movie recommendations to a website
  • Project 6:  Design a system to report web sessions per day
  • Capstone Project


program outcomes:

  • Python and Linux Fundamentals
  • SQL and Database fundamentals
  • Big Data fundamentals
  • Big Data technologies
  • Big Data governance
  • Available sources of Big Data
  • Data Mining, its concepts, and some of the tools used for Data Mining
  • Hadoop, including its concepts, how to install and configure it, the concepts behind
  • MapReduce, and how Hadoop can be used in real-life scenarios
  • MongoDB, including its concepts, how to install and configure it, the concepts behind document databases and how MongoDB can be used in real-life scenarios
  • Detailed understanding of Big Data and Data Mining concepts.
  • Ability to identify and obtain relevant datasets when looking at a business problem.
  • Ability to install and manage Big Data processing environments based on Hadoop or MongoDB at a departmental level.


Program Duration: 13 Weeks

Program Language: English / Arabic

Location: EPSILON AI INSTITUTE | Head Office

Participants will be granted a completion certificate from Epsilon AI Institute, USA if they attend a minimum of 80 percent of the direct contact hours of the Program and after fulfilling program requirements (passing both Final Exam and Project to obtain the Certificate)



1.Networking Fundamentals

  • Build simple Local Area Networks.
  • Perform basic configurations.
  • Implement IPv4 and IPv6 addressing schemes.
  • Configure routers, switches, and end devices.
  • Configure and troubleshoot connectivity in a small network.
  • Configure and troubleshoot VLANs, Wireless LANs and Inter-VLAN
  • routing
  • Cloud and Virtualization Concepts


  • Introduction to Linux
  • A general overview of the Linux environment
  • Overview of the command-line interface
  • Navigating Linux directory structure
  • Manipulating files and directories
  • Basic Linux commands
  • Permissions

3.PYTHON for Big Data Analysis

  • Setting up Python & Anaconda
  • Computing
  • Programming
  • Debugging
  • Procedural Programming
  • Variables
  • Logical Operators
  • Mathematical Operators
  • Control Structures
  • Conditionals
  • Loops
  • Functions
  • Error Handling
  • Data Structures
  • Strings
  • Lists
  • File Input and Output
  • Dictionaries
  • Objects
  • Algorithms


  • What Is Data?
  • Why Organize Data?
  • What Does a DBMS Do?
  • Relational Databases and SQL
  • The Success of RDBMSs and SQL
  • Operational and Analytic Databases
  • SELECT Statements
  • DML Activity
  • Further Comparisons
  • Table and Column Design
  • Relational Database Design
  • Database Transactions
  • Big Data Volume
  • Big Data Variety and Velocity
  • Big Data Systems Vs Traditional Systems.
  • Effects of Data Structures.
  • Big Data Analytics Databases.
  • NoSQL: Operational, Structured, and semi-structured.
  • Non-transactional, Structured Systems
  • Big Data ACID-Compliant RDBMSs
  • Search Engines
  • Features of SQL for Big Data Analysis
  • Different Big Data Stores
  • Downloading and Installing the Exercise Environment


  • Big Data: Why and Where?
  • Characteristics of Big Data
  • Dimensions of scalability
  • Data Science: Getting Value out of Big Data
  • What is a Distributed File System⬡ Scalable Computing Over the Internet
  • Programming Models for Big Data
  • Hadoop Overview and History
  • Overview of the Hadoop Ecosystem
  • Data Storage: HDFS
  • Distributed Data Processing: YARN, MapReduce, and Spark
  • Data Processing and Analysis: Hive, and Impala
  • Operation of Apache Hive and Apache Impala
  • Exploring Databases and Tables with Hue
  • Database Integration: Sqoop
  • Other Hadoop Data Tools



  • Why Spark?
  • Databases and Tables
  • Basic Hive and Impala Query Language Syntax
  • Data Types
  • Using Hue to Execute Queries
  • Using Beeline (Hive’s Shell)
  • Using the Impala Shell
  • SQL SELECT Building Blocks
  • Expressions and Operators
  • Built-In Functions
  • Data Type Conversion
  • The DISTINCT Keyword
  • Introduction to the FROM Clause
  • Identifiers
  • Formatting SELECT Statements
  • Using Beeline in Non-Interactive Mode
  • Using Impala Shell in Non-Interactive Mode
  • Formatting and saving the Output of Beeline and Impala Shell
  • Using Expressions in the WHERE Clause
  • Comparison Operators
  • Data Types and Precision
  • Logical Operators
  • Other Relational Operators
  • Handling Missing Values
  • Conditional Functions
  • Using Variables with Beeline and Impala Shell
  • Calling Beeline and Impala Shell from Scripts
  • Querying Hive and Impala in Scripts and Applications
  • Aggregate Functions & Use in the SELECT Statement
  • The GROUP BY Clause
  • Choosing an Aggregate Function and Grouping Column
  • Grouping Expressions
  • Grouping and Aggregation, Together and Separately
  • NULL Values in Grouping and Aggregation
  • The COUNT Function
  • Filtering on Aggregates
  • The HAVING Clause
  • The ORDER BY Clause & Controlling Sort Order
  • Ordering Expressions
  • Missing Values in Ordered Results
  • Using ORDER BY with Hive and Impala
  • Introduction to the LIMIT Clause
  • When to Use the LIMIT Clause
  • Using LIMIT with ORDER BY
  • Using LIMIT for Pagination
  • Combining Query Results with the UNION Operator
  • Using ORDER BY and LIMIT with UNION
  • Join Syntax & Types of Join
  • Handling NULL Values in Join Key Columns
  • Understanding Hive and Impala Version Differences
  • Understanding Hue Version Differences
  • How to Effectively Use the Hive and Impala Documentation


  • Working with Samples
  • Telecommunication Analytics
  • Working with Social Graphs
  • Sample application: consume web server logs and keep track of top sellers
  • Sample application: serving movie recommendations to a website
  • Design a system to report web sessions per day

8.Git & GitHub

  • What is the use of version control?
  • Install Git
  • Create repo
  • Check Status
  • Add changes to the staging area
  • Commit changes
  • Show commits log
  • .gitignore
  • What is GitHub
  • Clone repo
  • Push & Pull
  • Use Git Kraken GUI

9.Advance your Career

  • Boost your Profile on Kaggle
  • Build up your online presence
    • Medium Blog
    • YouTube Channel
    • Contribute to Open-Source Community on GitHub
  • Build your Resume
  • LinkedIn and Networking
  • Learn how to seek a job









Download Certified Big Data Associate – CBDA Brochure PDF




    Course Curriculum

    No curriculum found !
    Copyright © 2023 Epsilon AI Registered in Egypt with company no. 118268