Presentation is loading. Please wait.

Presentation is loading. Please wait.

Your Name.  Recap  Advance  Built-In Function  UDF  Conclusion.

Similar presentations


Presentation on theme: "Your Name.  Recap  Advance  Built-In Function  UDF  Conclusion."— Presentation transcript:

1 Your Name

2  Recap  Advance  Built-In Function  UDF  Conclusion

3 Pig Advance

4  A platform for analyzing large data sets  Local mode  Distributed mode  Script language(Pig Latin) but not equals to SQL

5  Key type : field, tuple, and bag  Schema : way to assign name & type of a value  Operators : useful built-in operators  LOAD/STORE  GROUP/COGROUP  JOIN  FILTER  FOREACH  (…)  Tools : DUMP & DESCRIBE

6 Loading Data Working with Data Storing Intermediate Results Storing Final Results Debugging Pig Latin a = LOAD ‘data' AS (age:int, name:chararray); b = FILTER a BY (age > 75); c = FOREACH b GENERATE *; STORE c INTO 'population'; a = LOAD ‘data' AS (age:int, name:chararray); b = FILTER a BY (age > 75); c = FOREACH b GENERATE *; STORE c INTO 'population';

7 Pig Advance

8  Don’t need to be registered  Don't need to be qualified when they are used  Just use as you need!

9 EvalMathString AVGABSINDEXOF CONCATACOSLAST_INDEX_OF COUNTASINLCFIRST COUNT_STARCBRTUCFIRST DIFFCEILLOWER ISEMPTYCOSUPPER MAXCOSHREPLACE MINEXPSUBSTRING SIZEFLOORTRIM SUMLOGREGEX_EXTRACT TOKENIZELOG10REGEX_EXTRACT_ALL For complete reference, please visit herehere

10 NameSyntaxDescription TOTUPLE TOTUPLE(expression [, expression...]) Converts one or more expressions to type tuple. TOMAP TOMAP(key-expression, value- expression [, key-expression, value- expression...]) Converts key/value expression pairs into a map TOBAGTOBAG(expression [, expression...]) Converts one or more expressions to type bag TOPTOP(topN,column,relation) Returns the top-n tuples from a bag of tuples. For complete reference, please visit herehere

11  Computes the number of elements in a bag.  Requiring a preceding GROUP ALL statement for global counts or a GROUP BY statement for group counts.  It will ignore nulls. If you want to include NULL values in the count computation, use COUNT_STAR

12 a = LOAD 'data' AS (f1:int, f2:int, f3:int); b = GROUP a BY f1; x1 = FOREACH b GENERATE COUNT(a); x2 = FOREACH b GENERATE COUNT_STAR(a); a = LOAD 'data' AS (f1:int, f2:int, f3:int); b = GROUP a BY f1; x1 = FOREACH b GENERATE COUNT(a); x2 = FOREACH b GENERATE COUNT_STAR(a); 1 2 3 8 3 4 7 2 5 8 4 3 1 1 1 2 3 8 3 4 7 2 5 8 4 3 1 1 DUMP x1; DUMP x2;

13  Computes the sum of the numeric values in a single-column bag.  Requiring a preceding GROUP ALL statement for global sums and a GROUP BY statement for group sums. a = LOAD 'data' USING PigStorage(‘,’) AS (owner:chararray,pet_type:chararray,pet_cou nt:int); b = GROUP a BY owner; x = FOREACH b GENERATE group, SUM(a.pet_num); a = LOAD 'data' USING PigStorage(‘,’) AS (owner:chararray,pet_type:chararray,pet_cou nt:int); b = GROUP a BY owner; x = FOREACH b GENERATE group, SUM(a.pet_num); Alice,turtle,1 Alice,goldfish,5 Alice,cat,2 Bob,dog,2 Bob,cat,2 DUMP x;

14  PigStorage  TextLoader  JsonLoader/JsonStorage  (Others)

15 Pig Advance

16  So called “User Defined Function”  Currently, could be implemented by Java/Python/Javascript/Ruby. (The most extensive support is provided for Java)  Types Eval Function Load/Store Function Piggy Bank – Before you write your own https://cwiki.apache.org/confluence/display/PIG/PiggyBank

17  Pig Types and Native Java Types Pig TypeJava Class bytearrayDataByteArray chararrayString intInteger longLong floatFloat doubleDouble tupleTuple bagDataBag mapMap

18  Compile pig.jar first  Register UDF jar in your pig script  Using the UDF with full name (package + class name)  Example

19  EvalFunc public abstract T exec (Tuple input) throws IOException public Schema outputSchema (Schema input) public List getArgToFuncMapping () throws FrontendException

20 Extends EvalFunc Example: ChairbelongstoPhoenix PencialbelongstoVincent chair, tcloud_Phoenix pencial, tcloud_Vincent UDF Pig script

21 Extends EvalFunc Example: lamp#yellow desk#brown chair#green water#transparent (lamp,yellow) (desk,brown) (chair,green) (water,transparent) UDF Pig script

22 Extends FilterFunc Example : Mary,John,Steve#Steve Tom#Stevet Mary,John,Steve#Steve UDF Pig script

23  Basic class is LoadFunc/StoreFunc  Aligned with Hadoop's InputFormat and OutputFormat

24  Extends LoadFunc  getInputFormat  prepareToRead  setLocation  getNext  Example

25  Schema  Error handling  WrappedIOException (deprecated)  Function overloading  Reporting progress  Protected data variabe in Class EvalFunc : reporter.progress();

26 Pig Latin + UDF = Easily To Analyze (Big) Data !

27


Download ppt "Your Name.  Recap  Advance  Built-In Function  UDF  Conclusion."

Similar presentations


Ads by Google