Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to HDInsight June 27 th,

Similar presentations


Presentation on theme: "An Introduction to HDInsight June 27 th,"— Presentation transcript:

1 An Introduction to HDInsight June 27 th, 2013 cschmidt@pragmaticworks.com @sqlbischmidt http://intelligentsql.wordpress.com

2 Big Data

3 Structured or Unstructured? Structured data is identifiable Organized by columns and rows databases Unstructured data has no such identifiable structure

4 HDInsight Getting Started “Apache”Hadoop based service Modern, cloud based solution platform that manages data of any type and/or size Big data does not provide value on its own, it must be ETL’d

5 HDInsight (continued) An HDInsight Azure instance consists of a head node (also called a namenode) and one or more data nodes Benefits: Integration into Social Media Advanced Analytics “Live” Changes What’s the weather like right now?

6 MapReduce MapReduce takes a large, unstructured data set and breaks it down by mapping, shuffling, and sorting the data to generate an output file that contains the level along with an output file HDFS: Hadoop distributed file system Data gets distributed over multiple drives on multiple servers JAR files: bundled MapReduce code that can be compiled and executed

7 Map Reduce Data Flow

8 Pig Pig is an alternative to writing Java scripting code for creating and running MapReduce jobs. The language is called Pig Latin Using Pig is a good way to reduce the time needed to create MapReduce programs Many algorithms can be written in less than 5 lines of Pig Latin code!

9 Pig Pig Latin statements follow a general flow of: LOAD TRANSFORM DUMP or STORE Pig Latin can be written in either grunt mode (interactive) or script mode (batch)

10 Hive Hive is the “SQL like” language that lays on top of Hadoop Commonly referred to as Hive Query Language (or HQL) Structure without modeling Hive can handle larger data sets than SQL as it queries data in parallel across multiple nodes using MapReduce

11 Data Explorer Data Explorer is currently in Preview mode from Microsoft Excel can connect directly to our HDInsight data cluster that we can use to bring data in for analysis. Can then join this data with other relational sources to “mash” the data together

12 Additional Resources Apache Homepage https://cwiki.apache.org/confluence/display/Hive/GettingStarted HDInsight http://www.windowsazure.com/en- us/manage/services/hdinsight/ http://www.windowsazure.com/en- us/manage/services/hdinsight/ Horton Works http://hortonworks.com/


Download ppt "An Introduction to HDInsight June 27 th,"

Similar presentations


Ads by Google