Presentation is loading. Please wait.

Presentation is loading. Please wait.

Before the Session Verify HDInsight Emulator properly installed Verify Visual Studio and NuGet installed on emulator system Verify emulator system has.

Similar presentations


Presentation on theme: "Before the Session Verify HDInsight Emulator properly installed Verify Visual Studio and NuGet installed on emulator system Verify emulator system has."— Presentation transcript:

1 Before the Session Verify HDInsight Emulator properly installed Verify Visual Studio and NuGet installed on emulator system Verify emulator system has web access Reboot your emulator system to insure clean startup Download sample files to c:\temp on your emulator system: http://bit.ly/1ffaIHW

2 MSBIC Hadoop Series Understanding the File System Bryan Smith email: bryan.smith@microsoft.com twitter: @smithbryanc

3 MSBIC Hadoop Series http://msbic.sqlpass.org/ Learn the basics of Hadoop through a combination of demonstration and lecture. Session participants are invited to follow along leveraging emulation environments and Azure-based clusters, the setting up of which we will address in our first session. March – Getting StartedAugust – On Vacation April – Understanding the File SystemSeptember – Hadoop & MS BI May – Implementing MapReduce Jobs October – To Be Announced June – Querying the Data with Hive November – Loading Social Media Data July – Processing the Data with PigDecember – DW Integration

4 Today’s Session Objectives: 1.Understand the basics of the Hadoop file system 2.Load data into our Hadoop clusters

5 HDFS Name Node Data Node XYZ X YZXXYYZZ

6 HDFS Fundamentals Default block size is 64 MB Incomplete blocks are less than 64 MB Block size configurable in hdfs-site.xml configuration file Blocks protected via replication 3x is the norm for distributed clusters Replication configurable in hdfs-site.xml configuration file File-to-block mappings maintained by name node Protection through replication or failover to secondary name nodes

7 Interacting with the File System Command Line Local from the name node Remote via PowerShell WebHDFS (REST) API Direct interaction WebHDFS client

8 Exercise: Command Line Data Loading Open Hadoop command line Create directory for sample data Load data to directory Verify data in directory

9 Appendix: Demo Script hadoop fs –ls / hadoop fs –mkdir /demo/simple/in hadoop fs –put c:\temp\integers.txt /demo/simple/in/integers.txt hadoop fs –ls /demo/simple/in hadoop fs –cat /demo/simple/in/integers.txt hadoop fsck /demo/simple/in/ -files -blocks

10 Appendix: File System User- Commands hadoop fs : provides access to file system shell more info more info hadoop fsck : runs the HDFS filesystem checking utility more info more info hadoop distcp : copy files recursively within or between clusters more info more info

11 Appendix: The hadoop fs Commands Get Help: help Explore File Contents: cat, tail, text Explore FS Contents: ls, lsr, du, dus, stat, test, count Modify FS Contents: mkdir, touchz, rm, rmr, expunge, mv, cp, chmod, chown, chgrp Transfer From Local FS: put, copyFromLocal, moveFromLocal Transfer To Local FS: get, copyToLocal, moveToLocal, getmerge

12 Interacting with the File System Command Line Local from the name node Remote via PowerShell WebHDFS (REST) API Direct interaction WebHDFS client

13 WebHDFS: The REST Interface Name Node Data Node 50070

14 WebHDFS Methods WebHDFSNative InterfaceDescription OPEN (HTTP GET)DistributedFileSystem.openOpens an FSDataInputStream at the indicated Path GETFILESTATUS (HTTP GET)DistributedFileSystem.getFileStatusReturn a file status object that represents the path LISTSTATUS (HTTP GET)DistributedFileSystem.listStatusList the statuses of the files/directories in the given path GETCONTENTSUMMARY (HTTP GET)DistributedFileSystem.getContentSummaryReturn the ContentSummary of a given Path GETFILECHECKSUM (HTTP GET)DistributedFileSystem.getFileChecksumGet the checksum of a file GETHOMEDIRECTORY (HTTP GET)DistributedFileSystem.getHomeDirectoryReturn the current user's home directory in this filesystem GETDELEGATIONTOKEN (HTTP GET)DistributedFileSystem.getDelegationTokenGet a new delegation token for this file system CREATE (HTTP PUT)DistributedFileSystem.createOpens an FSDataOutputStream at the indicated Path MKDIRS (HTTP PUT)DistributedFileSystem.mkdirsCreate a directory with the provided permission RENAME (HTTP PUT)DistributedFileSystem.renameRenames Path src to Path dst SETREPLICATION (HTTP PUT)DistributedFileSystem.setReplicationSet replication for an existing file SETOWNER (HTTP PUT)DistributedFileSystem.setOwnerSet owner of a path SETPERMISSION (HTTP PUT)DistributedFileSystem.setPermissionSet permission of a path SETTIMES (HTTP PUT)DistributedFileSystem.setTimesSet access time of a file RENEWDELEGATIONTOKEN (HTTP PUT)DistributedFileSystem.renewDelegationTokenRenew a delegation token for this file system CANCELDELEGATIONTOKEN (HTTP PUT)DistributedFileSystem.cancelDelegationTokenCancel a delegation token for this file system APPEND (HTTP POST)DistributedFileSystem.appendAppend to an existing file DELETE (HTTP DELETE)DistributedFileSystem.deleteDelete a file More info

15 Exercise: WebHDFS Client Data Loading Create new.NET console application Install.NET WebHDFS client Connect to HDInsight Emulator Create a new directory Load data to directory Verify data in directory

16 Appendix: Abbreviated Steps for App 1.Launch Visual Studio 2.Create new console application 3.Use NuGet to install Microsoft.NET API for Hadoop WebClient 4.Add the using Microsoft.Hadoop.WebHDFS directive 5.Add the using System.Threading.Tasks directive 6.Instantiate WebHDFS client with: WebHDFSClient myClient = new WebHDFSClient(new Uri(http://localhost:50070), “hadoop”);http://localhost:50070 6.Upload file with: myClient.CreateFile( “c:\\temp\\ufo_awesome.tsv”, “/demo/ufo/in/ufo_awesome.tsv”).Wait(); More complete console application documented here here

17 Azure HDInsight Name Node Data Node Azure Blob Storage 50070

18 Working with the Azure HDInsight Command-Line Options Local from the name node (once enabled) Remote via PowerShell WebHDFS (REST) API Azure Blob Storage API Various third-party tools WebHDFS client (sample) or Azure Blob Storage client (sample)sample

19 Azure Blob Storage Name Node Data Node

20 Today’s Session Objectives: 1.Understand the basics of the Hadoop file system 2.Load data into our Hadoop clusters

21 For Next Session Topic:  Implementing MapReduce  Develop a Simple MapReduce Job using C# Requested Action(s):  Come with working HDInsight Emulator  Load sample data sets into HDFS on Emulator


Download ppt "Before the Session Verify HDInsight Emulator properly installed Verify Visual Studio and NuGet installed on emulator system Verify emulator system has."

Similar presentations


Ads by Google