Hadoop is a general-purpose Big Data platform that offers distributed scalability for data storage and flexible options for working with data. In this session, we’ll learn how various Hadoop tools and techniques address particular data scenarios, including traditional reporting, NoSQL data management, and Machine Learning. In the lab, we’ll use Hive, the SQL engine for Hadoop, to query data at scale, as if Hadoop were a traditional data warehouse.
We will spend the 45 minute talk laying out the Hadoop landscape and how you would approach different kinds of problems using different tools. The 2-hour lab will focus on one tool, Hive, the SQL tool for using Hadoop as a data warehouse. The goal is to discuss how Hadoop is a flexible environment for all kinds of data needs, some well outside the traditional realms of data “solutions”, for example doing Machine Learning, and it supports NoSQL options. The lab shows how the traditional approach of using SQL to query the data still works with Hadoop, when that’s the most appropriate tool for particular needs. Actually, you get most of what SQL traditionally provides. We’ll highlight what’s different.