Presentation is loading. Please wait.

Presentation is loading. Please wait.

PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Similar presentations


Presentation on theme: "PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,"— Presentation transcript:

1 PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007, Microsoft, Corp. All rights reserved.

2 Research Context (1)  New Microsoft technology, Language Integrated Query (LINQ)  Primary goals:  Data-source agnostic, type-safe query language  Simplify expression of complex, multi-step operations over sets of data  Why?  Many programs contain textual (untyped) SQL, XPath, XQuery, …  Programs must deal with increasingly larger quantities of data  Hardware: memory and hard disk capacities continue to grow GB  TB  PB  …  Industry software: rich media, interactive visualizations, AI, NLP  Databases are ubiquitous, but aren’t always the solution  Features  SQL-like relational algebra language syntax and libraries  Supports queries over in-memory collections, XML, and RDBMS’s  In Microsoft’s “developer division”  Entity responsible for Visual Studio, Visual C#, Basic, and C++  Releasing as part of Visual Studio 2007  New Microsoft technology, Language Integrated Query (LINQ)  Primary goals:  Data-source agnostic, type-safe query language  Simplify expression of complex, multi-step operations over sets of data  Why?  Many programs contain textual (untyped) SQL, XPath, XQuery, …  Programs must deal with increasingly larger quantities of data  Hardware: memory and hard disk capacities continue to grow GB  TB  PB  …  Industry software: rich media, interactive visualizations, AI, NLP  Databases are ubiquitous, but aren’t always the solution  Features  SQL-like relational algebra language syntax and libraries  Supports queries over in-memory collections, XML, and RDBMS’s  In Microsoft’s “developer division”  Entity responsible for Visual Studio, Visual C#, Basic, and C++  Releasing as part of Visual Studio 2007 © 2007, Microsoft, Corp. All rights reserved.2

3 Research Context (2)  This talk describes extensions to LINQ to accomplish parallel queries execution, i.e. Parallel LINQ (PLINQ)  Goals:  Apply data parallelism to LINQ query execution  Preserve LINQ programming model, little to no req’d interface changes  Deal efficiently with composition and nesting of query operators  Audience: developers on Microsoft’s.NET platform, C#, VB, and VC++  Architectures: those running Windows – mostly MIMD (w/ SIMD/vector extensions), multi-core machines in the range of 2…64 processors, typically AMD or Intel  Mostly an application of other techniques: RDBMS parallel query execution (Volcano, SQL Server, Oracle), NESL, GpH,, and others  This talk describes extensions to LINQ to accomplish parallel queries execution, i.e. Parallel LINQ (PLINQ)  Goals:  Apply data parallelism to LINQ query execution  Preserve LINQ programming model, little to no req’d interface changes  Deal efficiently with composition and nesting of query operators  Audience: developers on Microsoft’s.NET platform, C#, VB, and VC++  Architectures: those running Windows – mostly MIMD (w/ SIMD/vector extensions), multi-core machines in the range of 2…64 processors, typically AMD or Intel  Mostly an application of other techniques: RDBMS parallel query execution (Volcano, SQL Server, Oracle), NESL, GpH,, and others © 2007, Microsoft, Corp. All rights reserved.3

4 Syntax: A Query Language var q = from x1 in y join x2 in z on x2.fA equals x1.fA where p(x2.fB) orderby x1.fC select new { x1.fA, x2.fB, x1.fC }; int r = q.Sum(a => a.fB*a.fC); © 2007, Microsoft, Corp. All rights reserved.4

5 Queries == Trees of Operators  A query is comprised of a tree of operators  Most operators operate on a stream t of type T* and produce a (lazily, on- demand generated) stream u of type U*, i.e. T*  U*  var q = from x in A where (x % seed) == 0 select x/0.33f;  Many operators are unary: forming a stream, but others are binary, i.e. a tree  Some operators “terminate” the stream by reducing to a non-stream, i.e. T*  U  float s = q.Sum();  As with a program AST, these trees can be analyzed, rewritten  Declarative, data-intensive, and bulk transformation nature means execution technique == implementation detail  This is why we can safely introduce parallelism  A query is comprised of a tree of operators  Most operators operate on a stream t of type T* and produce a (lazily, on- demand generated) stream u of type U*, i.e. T*  U*  var q = from x in A where (x % seed) == 0 select x/0.33f;  Many operators are unary: forming a stream, but others are binary, i.e. a tree  Some operators “terminate” the stream by reducing to a non-stream, i.e. T*  U  float s = q.Sum();  As with a program AST, these trees can be analyzed, rewritten  Declarative, data-intensive, and bulk transformation nature means execution technique == implementation detail  This is why we can safely introduce parallelism © 2007, Microsoft, Corp. All rights reserved.5 Where Select Where Join …

6 Declaring Queries © 2007, Microsoft, Corp. All rights reserved.6

7 Query Inputs and Outputs © 2007, Microsoft, Corp. All rights reserved.7

8 C# Query Comprehension Syntax © 2007, Microsoft, Corp. All rights reserved.8 expr ::= … | query-expr query-expr ::= from-clause query-body from-clause ::= ‘from’ itemNameExpr ‘in’ srcExpr query-body ::= join-clause* (from-clause join-clause* | let-clause | where-clause)* orderby-clause? (select-clause | groupby-clause) query-continuation join-clause ::= ‘join’ itemNameExpr ‘in’ srcExpr ‘on’ keyExpr1 ‘equals’ keyExpr2 (‘into’ itemNameExpr)? let-clause ::= ‘let’ itemNameExpr ‘=’ selExpr where-clause ::= ‘where’ predExpr orderby-clause ::= ‘orderby’ (keyExpr (‘ascending’ | ‘descending’)?)* select-clause ::= ‘select’ selExpr groupby-clause ::= ‘group’ selExpr ‘by’ keyExpr query-continuation ::= ‘into’ itemNameExpr ‘query-body’ expr ::= … | query-expr query-expr ::= from-clause query-body from-clause ::= ‘from’ itemNameExpr ‘in’ srcExpr query-body ::= join-clause* (from-clause join-clause* | let-clause | where-clause)* orderby-clause? (select-clause | groupby-clause) query-continuation join-clause ::= ‘join’ itemNameExpr ‘in’ srcExpr ‘on’ keyExpr1 ‘equals’ keyExpr2 (‘into’ itemNameExpr)? let-clause ::= ‘let’ itemNameExpr ‘=’ selExpr where-clause ::= ‘where’ predExpr orderby-clause ::= ‘orderby’ (keyExpr (‘ascending’ | ‘descending’)?)* select-clause ::= ‘select’ selExpr groupby-clause ::= ‘group’ selExpr ‘by’ keyExpr query-continuation ::= ‘into’ itemNameExpr ‘query-body’

9 Common Query Operators  Binding operators, used to express operations on abstract elements  Bind: from x in A – bind variable x to a single element e in the data source A, one at a time, so that x may be referenced in the query text  Cross product bind: from x in A from y in B – create the relational cross- product, A × B, binding x and y to members of the resulting pairs (x, y)  Let bind: let x = e – bind variable x to the result of evaluating expression e  General operators, to perform relational operations  Selection: where p – for each element e of type T, yield only those for which the selection predicate, p(e), of form T  bool evaluates to true  Sort: orderby k (ascending | descending)? – order the elements of type T ascending or descending based on keys generated with the key-selection function k, of form T  K  Map: select p – transform each element e from type T to U via the projection function, p(e), of form T  U  Equi-join: join y in B on k1 equals k2 (into z)? – for each pair of elements (x, y) in the cross-product of the “left” input A and the “right” input B, for which k1(x) == k2(y), bind the result to y (or z if specified—“group join”)  Grouping: group p by k – yield groupings of data, of type (K, T*) for which k(e), of the form T  K, is equal for all e in the group  Binding operators, used to express operations on abstract elements  Bind: from x in A – bind variable x to a single element e in the data source A, one at a time, so that x may be referenced in the query text  Cross product bind: from x in A from y in B – create the relational cross- product, A × B, binding x and y to members of the resulting pairs (x, y)  Let bind: let x = e – bind variable x to the result of evaluating expression e  General operators, to perform relational operations  Selection: where p – for each element e of type T, yield only those for which the selection predicate, p(e), of form T  bool evaluates to true  Sort: orderby k (ascending | descending)? – order the elements of type T ascending or descending based on keys generated with the key-selection function k, of form T  K  Map: select p – transform each element e from type T to U via the projection function, p(e), of form T  U  Equi-join: join y in B on k1 equals k2 (into z)? – for each pair of elements (x, y) in the cross-product of the “left” input A and the “right” input B, for which k1(x) == k2(y), bind the result to y (or z if specified—“group join”)  Grouping: group p by k – yield groupings of data, of type (K, T*) for which k(e), of the form T  K, is equal for all e in the group © 2007, Microsoft, Corp. All rights reserved.9

10 Some Example Queries © 2007, Microsoft, Corp. All rights reserved.10  Word counts: string doc = …; var counts = from w in doc.Split(' ') group w by w;  Weighted average: float[] D = …, W = …; float avg = D.ZipWith(W, (x,y) => x*y).Sum() / W.Sum();  “Select customers whose billing address is in Washington in the United States, or whose cumulative order total is >= $25 USD; order them by total $ descending, group them by state, and project just their name and total”: Set custs = …; Set ords = …; Set addrs = …; var q = from c in custs join o in ords on o.CustomerID equals c.ID into co join a in addrs on a.AddressID equals o.BillingID let ordTotal = co.Sum(o => o.TotalCost) where (a.State == "WA" && a.Country == "United States") || ordTotal >= $25.00 orderby ordTotal descending group new {c.LastName,c.FirstName,ordTotal} by a.State;  Word counts: string doc = …; var counts = from w in doc.Split(' ') group w by w;  Weighted average: float[] D = …, W = …; float avg = D.ZipWith(W, (x,y) => x*y).Sum() / W.Sum();  “Select customers whose billing address is in Washington in the United States, or whose cumulative order total is >= $25 USD; order them by total $ descending, group them by state, and project just their name and total”: Set custs = …; Set ords = …; Set addrs = …; var q = from c in custs join o in ords on o.CustomerID equals c.ID into co join a in addrs on a.AddressID equals o.BillingID let ordTotal = co.Sum(o => o.TotalCost) where (a.State == "WA" && a.Country == "United States") || ordTotal >= $25.00 orderby ordTotal descending group new {c.LastName,c.FirstName,ordTotal} by a.State;

11 Additional Query Operators © 2007, Microsoft, Corp. All rights reserved.11  Some have no syntactic representation and must be accessed w/ library calls:  ForAll(A, a) : invoke side effecting operation a(x) for each element x in A  Concat(A, B) : linearly concatenate the data inputs A and B  Zip(A, B) : combine two inputs A and B into pairs by overlaying data  Reverse(A) : reverse the ordering of elements in vector A  Range(x, y) : generate a stream representing the range [x, y)  Set operators: Distinct(A), Union(A, B), Intersect(A, B)  Reductions (a.k.a. aggregations, folds): Aggregate(A, binOp), Count(A), Sum(A), Min(A), Max(A), Average(A), EqualAll(A, B), Any(A, p), All(A, p), Contains(A, e)  Some have no syntactic representation and must be accessed w/ library calls:  ForAll(A, a) : invoke side effecting operation a(x) for each element x in A  Concat(A, B) : linearly concatenate the data inputs A and B  Zip(A, B) : combine two inputs A and B into pairs by overlaying data  Reverse(A) : reverse the ordering of elements in vector A  Range(x, y) : generate a stream representing the range [x, y)  Set operators: Distinct(A), Union(A, B), Intersect(A, B)  Reductions (a.k.a. aggregations, folds): Aggregate(A, binOp), Count(A), Sum(A), Min(A), Max(A), Average(A), EqualAll(A, B), Any(A, p), All(A, p), Contains(A, e)

12 Runtime: Parallel Execution © 2007, Microsoft, Corp. All rights reserved.12

13 Operator Parallelism  Intra-operator, i.e. partitioning:  Input to a single operator is “split” into p pieces and run in parallel  Adjacent and nested operators can enjoy fusion  Good temporal locality of data – each datum “belongs” to a partition  Inter-operator, i.e. pipelining  Operators run concurrently with respect to one another  Can avoid “data skew”, i.e. imbalanced partitions, as can occur w/ partitioning  Typically incurs more synchronization overhead and yields considerably worse locality than intra-operator parallelism, so is less attractive  Partitioning is preferred unless there is no other choice  For example, sometimes the programmer wants a single-CPU view, e.g.: foreach (x in q) a(x)  Consumption action a for might be written to assume no parallelism  Bad if a(x) costs more than the element production latency  Otherwise, parallel tasks just eat up memory, eventually stopping when the bounded buffer fills  But a(x) can be parallel too  Intra-operator, i.e. partitioning:  Input to a single operator is “split” into p pieces and run in parallel  Adjacent and nested operators can enjoy fusion  Good temporal locality of data – each datum “belongs” to a partition  Inter-operator, i.e. pipelining  Operators run concurrently with respect to one another  Can avoid “data skew”, i.e. imbalanced partitions, as can occur w/ partitioning  Typically incurs more synchronization overhead and yields considerably worse locality than intra-operator parallelism, so is less attractive  Partitioning is preferred unless there is no other choice  For example, sometimes the programmer wants a single-CPU view, e.g.: foreach (x in q) a(x)  Consumption action a for might be written to assume no parallelism  Bad if a(x) costs more than the element production latency  Otherwise, parallel tasks just eat up memory, eventually stopping when the bounded buffer fills  But a(x) can be parallel too © 2007, Microsoft, Corp. All rights reserved.13

14 q = from x in A where p(x) select x 3 ;  Intra-operator:  Inter-operator:  Both composed: q = from x in A where p(x) select x 3 ;  Intra-operator:  Inter-operator:  Both composed: … Thread 4 … … Thread 3 … … Thread 2 … … Thread 1 … Parallelism Illustrations © 2007, Microsoft, Corp. All rights reserved.14 where p(x) select x 3 A where p(x) select x 3 … Thread 2 …… Thread 1 … A where p(x) select x 3 … Thread 2 … … Thread 1 … where p(x) select x 3 A where p(x) select x 3

15 Deciding Parallel Execution Strategy  Tree analysis informs decision making:  Where to introduce parallelism?  And what kind? (partition vs. pipeline)  Based on intrinsic query properties and operator costs  Data sizes, selectivity (for filter f, what % satisfies the predicate?)  Intelligent “guesses”, code analysis, adaptive feedback over time  But not just parallelism, higher level optimizations too, e.g.  Common sub-expression elimination, e.g. from x in X where p(f(x)) select f(x);  Reordering operations to:  Decrease cost of query execution, e.g. put a filter before the sort, even if the user wrote it the other way around  Achieve better operator fusion, reducing synchronization cost  Tree analysis informs decision making:  Where to introduce parallelism?  And what kind? (partition vs. pipeline)  Based on intrinsic query properties and operator costs  Data sizes, selectivity (for filter f, what % satisfies the predicate?)  Intelligent “guesses”, code analysis, adaptive feedback over time  But not just parallelism, higher level optimizations too, e.g.  Common sub-expression elimination, e.g. from x in X where p(f(x)) select f(x);  Reordering operations to:  Decrease cost of query execution, e.g. put a filter before the sort, even if the user wrote it the other way around  Achieve better operator fusion, reducing synchronization cost © 2007, Microsoft, Corp. All rights reserved.15

16 Partitioning Techniques  Partitioning can be data-source sensitive  If a nested query, can fuse existing partitions  If an array, calculate strides and contiguous ranges (+spatial locality)  If a (possibly infinite) stream, lazily hand out chunks  Partitioning can be operator sensitive  E.g. equi-joins employ a hashtable to turn an O(nm) “nested join” into O(n+m)  Build hash table out of one data source; then probe it for matches  Only works if all data elements in data source A with key k are in the same partition as those elements in data source B also with key k  We can use “hash partitioning” to accomplish this: for p partitions, calculate k for each element e in A and in B, and then assign to partition based on key, e.g. k.GetHashCode() % p  Output of sort: we can fuse, but restrict ordering, ordinal and key based  Existing partitions might be repartitioned  Can’t “push down” key partitioning information to leaves: types changed during stream data flow, e.g. select operator  Nesting: join processing output of another join operator  Or just to combat partition skew  Partitioning can be data-source sensitive  If a nested query, can fuse existing partitions  If an array, calculate strides and contiguous ranges (+spatial locality)  If a (possibly infinite) stream, lazily hand out chunks  Partitioning can be operator sensitive  E.g. equi-joins employ a hashtable to turn an O(nm) “nested join” into O(n+m)  Build hash table out of one data source; then probe it for matches  Only works if all data elements in data source A with key k are in the same partition as those elements in data source B also with key k  We can use “hash partitioning” to accomplish this: for p partitions, calculate k for each element e in A and in B, and then assign to partition based on key, e.g. k.GetHashCode() % p  Output of sort: we can fuse, but restrict ordering, ordinal and key based  Existing partitions might be repartitioned  Can’t “push down” key partitioning information to leaves: types changed during stream data flow, e.g. select operator  Nesting: join processing output of another join operator  Or just to combat partition skew © 2007, Microsoft, Corp. All rights reserved.16

17 Example: Query Nesting and Fusion © 2007, Microsoft, Corp. All rights reserved.17  Nesting queries inside of others is common  We can fuse partitions  var q1 = from x in A select x*2;  var q2 = q1.Sum();  Nesting queries inside of others is common  We can fuse partitions  var q1 = from x in A select x*2;  var q2 = q1.Sum(); select x*2 ++++ ++ ++++ ++ I. Select (alone)2. Sum (alone)3. Select + Sum

18 Execution of Work © 2007, Microsoft, Corp. All rights reserved.18  Windows’ finest granularity of work is a thread  Each partition has at most one thread assigned to it, assigned via a gang scheduling//dynamic work stealing-like (a la Cilk) algorithm  Tension between creating “just the right number of threads” (static+dynamic adaptivity) versus over partitioning work: would change some things, but maybe for the better  Hard to predict things like IO and blocking  Developer still has shared memory, can make horrible mistakes, e.g.: int s_x = 0; var q = from x in A where x == s_x++;  Analysis can sometimes catch this, but often not (dynamic function invocation, e.g. where x == side_effecting_func(…))  C#, and generally the CLR’s, type system doesn’t support the notion of purity (though some research systems, e.g. Spec#, provide hope)  Where is transactional memory when you need it?  Windows’ finest granularity of work is a thread  Each partition has at most one thread assigned to it, assigned via a gang scheduling//dynamic work stealing-like (a la Cilk) algorithm  Tension between creating “just the right number of threads” (static+dynamic adaptivity) versus over partitioning work: would change some things, but maybe for the better  Hard to predict things like IO and blocking  Developer still has shared memory, can make horrible mistakes, e.g.: int s_x = 0; var q = from x in A where x == s_x++;  Analysis can sometimes catch this, but often not (dynamic function invocation, e.g. where x == side_effecting_func(…))  C#, and generally the CLR’s, type system doesn’t support the notion of purity (though some research systems, e.g. Spec#, provide hope)  Where is transactional memory when you need it?

19 Some Conclusions & Observations © 2007, Microsoft, Corp. All rights reserved.19  Results have been encouraging: about what you’d expect given prior related research  Good performance, few changes required to the serial programming model  Given the upcoming public release of LINQ in VS, we hope reach will be good  Not a silver bullet – just one tool in a developer’s belt  Hard to “catch up” to huge parallelism constants on Windows, particularly given small data inputs and/or inexpensive operators var q = Range(0,100).Sum(); // add up #s [0,100)  Also easy to run into memory bottlenecks, possible opportunities for architecture-aware optimizations (we already try to maximize spatial+temporal locality)  Costs are hard to get right  Too much dynamism in the platform to arrive at a correct #  Even if we did, hard to create heuristics that scale well across platforms  Too much decomposition, too little, unexpected IO (paging, …), synchronization  But in the end: do costs really matter?  Or is it better to represent concurrency using a fixed granule and let another scheduling mechanism apply policy (work stealing)?  Many queries are candidates for SIMD/vector architectures  Targeting other instruction sets (SSEx, GPU) could be profitable  Results have been encouraging: about what you’d expect given prior related research  Good performance, few changes required to the serial programming model  Given the upcoming public release of LINQ in VS, we hope reach will be good  Not a silver bullet – just one tool in a developer’s belt  Hard to “catch up” to huge parallelism constants on Windows, particularly given small data inputs and/or inexpensive operators var q = Range(0,100).Sum(); // add up #s [0,100)  Also easy to run into memory bottlenecks, possible opportunities for architecture-aware optimizations (we already try to maximize spatial+temporal locality)  Costs are hard to get right  Too much dynamism in the platform to arrive at a correct #  Even if we did, hard to create heuristics that scale well across platforms  Too much decomposition, too little, unexpected IO (paging, …), synchronization  But in the end: do costs really matter?  Or is it better to represent concurrency using a fixed granule and let another scheduling mechanism apply policy (work stealing)?  Many queries are candidates for SIMD/vector architectures  Targeting other instruction sets (SSEx, GPU) could be profitable

20 The End © 2007, Microsoft, Corp. All rights reserved.20


Download ppt "PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,"

Similar presentations


Ads by Google