1. About the Course
  2. Intended Audience
  3. Syllabus


The Data Science and Big Data Analytics course gives down to earth establishment level preparing that empowers prompt and successful interest in Big Data and different Analytics ventures. It incorporates a prologue to Big Data and the Data Analytics lifecycle to address business challenges that influence Big Data. The course gives establishing in essential and progressed systematic techniques and a prologue to Big Data Analytics innovation and instruments. Lab sessions offer chances to see how these strategies and devices might be connected to true business challenges by a rehearsing Data Scientist. This course gives an industry accreditation to business investigators, information distribution center specialists or different experts with comparative foundations to help them change into the universe of Data Science and Big Data Analytics that has extraordinary difficulties and opportunities.


    Course Content

    Big Data

    • The problem space and example applications
    • Why don’t traditional approaches scale?
    • Requirements

    • Hadoop Background

      • Hadoop History
      • The ecosystem and stack: HDFS, MapReduce, Hive, Pig…
      • Cluster architecture overview

      • Development Environment

        • Hadoop distribution and basic commands
        • Eclipse development

        HDFS Introduction 

        • The HDFS command line and web interfaces
        • The HDFS Java API (lab)

        MapReduce Introduction

        • Key philosophy: move computation, not data
        • Core concepts: Mappers, reducers, drivers
        • The MapReduce Java API (lab)

        Real-World MapReduce

        • Optimizing with Combiners and Partitioners (lab)
        • More common algorithms: sorting, indexing and searching (lab)
        • Testing with MRUnit

        Higher-level Tools

        • Patterns to abstract “thinking in MapReduce”
        • The Cascading library (lab)
        • The Hive database (lab)