Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Jaql → pipes Unix pipes for the JSON data model Kevin Beyer, Vuk Ercegovac, Eugene Shekita, Jun Rao, Ning Li, Sandeep Tata IBM Almaden Research Center.

Similar presentations


Presentation on theme: "1 Jaql → pipes Unix pipes for the JSON data model Kevin Beyer, Vuk Ercegovac, Eugene Shekita, Jun Rao, Ning Li, Sandeep Tata IBM Almaden Research Center."— Presentation transcript:

1 1 Jaql → pipes Unix pipes for the JSON data model Kevin Beyer, Vuk Ercegovac, Eugene Shekita, Jun Rao, Ning Li, Sandeep Tata IBM Almaden Research Center Open Source!

2 2 Goals for Jaql Provide a simple, yet powerful language to manipulate semi- structured data.  Use JSON as a data model Data is usually converted to/from JSON view Most data has a natural JSON representation Easily extended using Java, Python, JavaScript, … Exploit massive parallelism using Hadoop

3 3 What is in the upcoming release? User feedback on previous release  Too XQuery-like (yuck factor)  Too complex Too composable, too nested, too verbose  Unclear what is parallelized Next release (planned 10/30/2008)  Vastly simplified syntax Inspired by Unix Pipes

4 4 A query is a pipeline source sink operator $people = file …; $greetings = file …; $people -> filter $.type = 'friendly‘ -> map { hello: $.name } -> write $greetings; // declare files // read input (json array) // find friendly people // keep just name // write output Operations listed in natural order vs last operation first one map job

5 5 Aggregate $people -> filter by $.birthdate < date(‘ ’) -> aggregate count($); // count the older people Aggregate the input into a single value  Using push-based, streaming, combining API to aggregate functions one map / combine / reduce job

6 6 Partition $people -> filter by $.birthdate < date(‘ ’) -> partition by $t = $.type// partition the older people by type |- aggregate { type: $t, n: count($) } -|; // aggregate per partition Partition one or more inputs Send each individual partition through a sub-pipe Merge the results one map / combine / reduce job

7 7 User-defined operators $people -> myBestMatches($, 3); // pass “standard input” to external code Call user code  Similar to calling user program / script in Unix Input and output are pipelined  Like “Hadoop streaming” Not Parallel!

8 8 Per partition sub-pipe Partition one or more inputs on a key Send each partition through (duplicate) sub-pipe Merge the results merge“split” partition $people -> partition by $.type// partition people by type |- sort by $.rating// sort partition by rating -> top 100// keep just the first 100 in partition -> myBestMatches($,3) -|;// find best matches per partition one map / reduce job

9 9 Partition by default Run sub-pipe on each partition of the input  If input is a file, use its partition, else arbitrary Expresses parallelism of user-defined operator $file -> partition by default// run per file partition |- buildPartialModel($) -|// partial model built per partition -> unifyModels($);// unify all partial the models into one one map job + serial unify

10 10 Join $people = file …; $children = file …; join $people on $people.id, $children on $children.mother; People: [ { id: 1, name: ‘Jack’ }, { id: 2, name: ‘Jill’ }, … ] Children: [{ id: 3, name: ‘Becky’, father: 1, mother: 2 }, …] [ { people: { id: 2, name: ‘Jill’ }, children: { id: 3, name: ‘Becky’, father: 1, mother: 2 } }, … ] one map / reduce job result is record with inputs as values joins on multiple inputs with multiple conditions Inner, left-, right-, full-outer joins

11 11 Composite Operators Join  Join two or more inputs on a key  Inner/outer/full  Multi-predicate, multi-way Merge  Concatenate all inputs in any order User-defined operator (function) Union, Intersect, Difference… composite operator Examples: One input can come from current pipe. Remaining inputs are pipe variables or nested pipes.

12 12 Composite sinks Tee  Send each input item to all output pipes $people -> tee |- filter $.gender == ‘F’ -> write $women |- map { $.name } -> write $names -|; Split  Send each input item to one pipe

13 13 Rough Unix analogs of Jaql UnixJaql catvar -> merge join grepfilter cut, paste, sed, tr map sort headtop uniqdistinct sort > filenamewrite tee Unix: stream of bytes / lines Jaql: stream of JSON items more structure / types

14 14 Summary Unix pipes revolutionized scripting If you know Unix pipes, you understand Jaql

15 15 Questions?Comments?


Download ppt "1 Jaql → pipes Unix pipes for the JSON data model Kevin Beyer, Vuk Ercegovac, Eugene Shekita, Jun Rao, Ning Li, Sandeep Tata IBM Almaden Research Center."

Similar presentations


Ads by Google