Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Section 1: Introduction to Hadoop
- History and core concepts of Hadoop
- The Hadoop ecosystem
- Various distributions
- High-level architecture
- Common Hadoop myths
- Challenges associated with Hadoop
- Hardware and software requirements
- Lab: First look at Hadoop
Section 2: HDFS
- Design and architecture
- Core concepts (horizontal scaling, replication, data locality, rack awareness)
- Daemons: NameNode, Secondary NameNode, DataNode
- Communication mechanisms and heartbeats
- Data integrity
- Read and write paths
- NameNode High Availability (HA) and Federation
- Labs: Interacting with HDFS
Section 3: MapReduce
- Core concepts and architecture
- Daemons (MRv1): JobTracker / TaskTracker
- Execution phases: Driver, Mapper, Shuffle/Sort, Reducer
- MapReduce Version 1 and Version 2 (YARN)
- Internal workings of MapReduce
- Introduction to Java MapReduce programs
- Labs: Running a sample MapReduce program
Section 4: Pig
- Pig compared to Java MapReduce
- Pig job flow
- The Pig Latin language
- ETL processes with Pig
- Transformations and Joins
- User-defined functions (UDF)
- Labs: Writing Pig scripts to analyze data
Section 5: Hive
- Architecture and design
- Data types
- SQL support within Hive
- Creating Hive tables and querying data
- Partitions
- Joins
- Text processing capabilities
- Labs: Various labs focused on processing data with Hive
Section 6: HBase
- Core concepts and architecture
- HBase vs RDBMS vs Cassandra
- HBase Java API
- Time series data handling in HBase
- Schema design
- Labs: Interacting with HBase via shell; programming in the HBase Java API; Schema design exercise
Requirements
- Proficiency in the Java programming language (the majority of programming exercises will be conducted in Java)
- Comfort with the Linux environment (ability to navigate the Linux command line and edit files using vi or nano)
Lab environment
Zero Install : There is no need for students to install Hadoop software on their own devices. A fully functional Hadoop cluster will be provided for use.
Students will need the following:
- An SSH client (Linux and Mac come with built-in SSH clients; for Windows, PuTTY is recommended)
- A web browser to access the cluster (Firefox is recommended)
28 Hours
Testimonials (1)
Hands on exercises. Class should have been 5 days, but the 3 days helped to clear up a lot of questions that I had from working with NiFi already