Download presentation
Presentation is loading. Please wait.
Published byMoses Morton Modified over 8 years ago
1
PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007, Microsoft, Corp. All rights reserved.
2
Research Context (1) New Microsoft technology, Language Integrated Query (LINQ) Primary goals: Data-source agnostic, type-safe query language Simplify expression of complex, multi-step operations over sets of data Why? Many programs contain textual (untyped) SQL, XPath, XQuery, … Programs must deal with increasingly larger quantities of data Hardware: memory and hard disk capacities continue to grow GB TB PB … Industry software: rich media, interactive visualizations, AI, NLP Databases are ubiquitous, but aren’t always the solution Features SQL-like relational algebra language syntax and libraries Supports queries over in-memory collections, XML, and RDBMS’s In Microsoft’s “developer division” Entity responsible for Visual Studio, Visual C#, Basic, and C++ Releasing as part of Visual Studio 2007 New Microsoft technology, Language Integrated Query (LINQ) Primary goals: Data-source agnostic, type-safe query language Simplify expression of complex, multi-step operations over sets of data Why? Many programs contain textual (untyped) SQL, XPath, XQuery, … Programs must deal with increasingly larger quantities of data Hardware: memory and hard disk capacities continue to grow GB TB PB … Industry software: rich media, interactive visualizations, AI, NLP Databases are ubiquitous, but aren’t always the solution Features SQL-like relational algebra language syntax and libraries Supports queries over in-memory collections, XML, and RDBMS’s In Microsoft’s “developer division” Entity responsible for Visual Studio, Visual C#, Basic, and C++ Releasing as part of Visual Studio 2007 © 2007, Microsoft, Corp. All rights reserved.2
3
Research Context (2) This talk describes extensions to LINQ to accomplish parallel queries execution, i.e. Parallel LINQ (PLINQ) Goals: Apply data parallelism to LINQ query execution Preserve LINQ programming model, little to no req’d interface changes Deal efficiently with composition and nesting of query operators Audience: developers on Microsoft’s.NET platform, C#, VB, and VC++ Architectures: those running Windows – mostly MIMD (w/ SIMD/vector extensions), multi-core machines in the range of 2…64 processors, typically AMD or Intel Mostly an application of other techniques: RDBMS parallel query execution (Volcano, SQL Server, Oracle), NESL, GpH,, and others This talk describes extensions to LINQ to accomplish parallel queries execution, i.e. Parallel LINQ (PLINQ) Goals: Apply data parallelism to LINQ query execution Preserve LINQ programming model, little to no req’d interface changes Deal efficiently with composition and nesting of query operators Audience: developers on Microsoft’s.NET platform, C#, VB, and VC++ Architectures: those running Windows – mostly MIMD (w/ SIMD/vector extensions), multi-core machines in the range of 2…64 processors, typically AMD or Intel Mostly an application of other techniques: RDBMS parallel query execution (Volcano, SQL Server, Oracle), NESL, GpH,, and others © 2007, Microsoft, Corp. All rights reserved.3
4
Syntax: A Query Language var q = from x1 in y join x2 in z on x2.fA equals x1.fA where p(x2.fB) orderby x1.fC select new { x1.fA, x2.fB, x1.fC }; int r = q.Sum(a => a.fB*a.fC); © 2007, Microsoft, Corp. All rights reserved.4
5
Queries == Trees of Operators A query is comprised of a tree of operators Most operators operate on a stream t of type T* and produce a (lazily, on- demand generated) stream u of type U*, i.e. T* U* var q = from x in A where (x % seed) == 0 select x/0.33f; Many operators are unary: forming a stream, but others are binary, i.e. a tree Some operators “terminate” the stream by reducing to a non-stream, i.e. T* U float s = q.Sum(); As with a program AST, these trees can be analyzed, rewritten Declarative, data-intensive, and bulk transformation nature means execution technique == implementation detail This is why we can safely introduce parallelism A query is comprised of a tree of operators Most operators operate on a stream t of type T* and produce a (lazily, on- demand generated) stream u of type U*, i.e. T* U* var q = from x in A where (x % seed) == 0 select x/0.33f; Many operators are unary: forming a stream, but others are binary, i.e. a tree Some operators “terminate” the stream by reducing to a non-stream, i.e. T* U float s = q.Sum(); As with a program AST, these trees can be analyzed, rewritten Declarative, data-intensive, and bulk transformation nature means execution technique == implementation detail This is why we can safely introduce parallelism © 2007, Microsoft, Corp. All rights reserved.5 Where Select Where Join …
6
Declaring Queries © 2007, Microsoft, Corp. All rights reserved.6
7
Query Inputs and Outputs © 2007, Microsoft, Corp. All rights reserved.7
8
C# Query Comprehension Syntax © 2007, Microsoft, Corp. All rights reserved.8 expr ::= … | query-expr query-expr ::= from-clause query-body from-clause ::= ‘from’ itemNameExpr ‘in’ srcExpr query-body ::= join-clause* (from-clause join-clause* | let-clause | where-clause)* orderby-clause? (select-clause | groupby-clause) query-continuation join-clause ::= ‘join’ itemNameExpr ‘in’ srcExpr ‘on’ keyExpr1 ‘equals’ keyExpr2 (‘into’ itemNameExpr)? let-clause ::= ‘let’ itemNameExpr ‘=’ selExpr where-clause ::= ‘where’ predExpr orderby-clause ::= ‘orderby’ (keyExpr (‘ascending’ | ‘descending’)?)* select-clause ::= ‘select’ selExpr groupby-clause ::= ‘group’ selExpr ‘by’ keyExpr query-continuation ::= ‘into’ itemNameExpr ‘query-body’ expr ::= … | query-expr query-expr ::= from-clause query-body from-clause ::= ‘from’ itemNameExpr ‘in’ srcExpr query-body ::= join-clause* (from-clause join-clause* | let-clause | where-clause)* orderby-clause? (select-clause | groupby-clause) query-continuation join-clause ::= ‘join’ itemNameExpr ‘in’ srcExpr ‘on’ keyExpr1 ‘equals’ keyExpr2 (‘into’ itemNameExpr)? let-clause ::= ‘let’ itemNameExpr ‘=’ selExpr where-clause ::= ‘where’ predExpr orderby-clause ::= ‘orderby’ (keyExpr (‘ascending’ | ‘descending’)?)* select-clause ::= ‘select’ selExpr groupby-clause ::= ‘group’ selExpr ‘by’ keyExpr query-continuation ::= ‘into’ itemNameExpr ‘query-body’
9
Common Query Operators Binding operators, used to express operations on abstract elements Bind: from x in A – bind variable x to a single element e in the data source A, one at a time, so that x may be referenced in the query text Cross product bind: from x in A from y in B – create the relational cross- product, A × B, binding x and y to members of the resulting pairs (x, y) Let bind: let x = e – bind variable x to the result of evaluating expression e General operators, to perform relational operations Selection: where p – for each element e of type T, yield only those for which the selection predicate, p(e), of form T bool evaluates to true Sort: orderby k (ascending | descending)? – order the elements of type T ascending or descending based on keys generated with the key-selection function k, of form T K Map: select p – transform each element e from type T to U via the projection function, p(e), of form T U Equi-join: join y in B on k1 equals k2 (into z)? – for each pair of elements (x, y) in the cross-product of the “left” input A and the “right” input B, for which k1(x) == k2(y), bind the result to y (or z if specified—“group join”) Grouping: group p by k – yield groupings of data, of type (K, T*) for which k(e), of the form T K, is equal for all e in the group Binding operators, used to express operations on abstract elements Bind: from x in A – bind variable x to a single element e in the data source A, one at a time, so that x may be referenced in the query text Cross product bind: from x in A from y in B – create the relational cross- product, A × B, binding x and y to members of the resulting pairs (x, y) Let bind: let x = e – bind variable x to the result of evaluating expression e General operators, to perform relational operations Selection: where p – for each element e of type T, yield only those for which the selection predicate, p(e), of form T bool evaluates to true Sort: orderby k (ascending | descending)? – order the elements of type T ascending or descending based on keys generated with the key-selection function k, of form T K Map: select p – transform each element e from type T to U via the projection function, p(e), of form T U Equi-join: join y in B on k1 equals k2 (into z)? – for each pair of elements (x, y) in the cross-product of the “left” input A and the “right” input B, for which k1(x) == k2(y), bind the result to y (or z if specified—“group join”) Grouping: group p by k – yield groupings of data, of type (K, T*) for which k(e), of the form T K, is equal for all e in the group © 2007, Microsoft, Corp. All rights reserved.9
10
Some Example Queries © 2007, Microsoft, Corp. All rights reserved.10 Word counts: string doc = …; var counts = from w in doc.Split(' ') group w by w; Weighted average: float[] D = …, W = …; float avg = D.ZipWith(W, (x,y) => x*y).Sum() / W.Sum(); “Select customers whose billing address is in Washington in the United States, or whose cumulative order total is >= $25 USD; order them by total $ descending, group them by state, and project just their name and total”: Set custs = …; Set ords = …; Set addrs = …; var q = from c in custs join o in ords on o.CustomerID equals c.ID into co join a in addrs on a.AddressID equals o.BillingID let ordTotal = co.Sum(o => o.TotalCost) where (a.State == "WA" && a.Country == "United States") || ordTotal >= $25.00 orderby ordTotal descending group new {c.LastName,c.FirstName,ordTotal} by a.State; Word counts: string doc = …; var counts = from w in doc.Split(' ') group w by w; Weighted average: float[] D = …, W = …; float avg = D.ZipWith(W, (x,y) => x*y).Sum() / W.Sum(); “Select customers whose billing address is in Washington in the United States, or whose cumulative order total is >= $25 USD; order them by total $ descending, group them by state, and project just their name and total”: Set custs = …; Set ords = …; Set addrs = …; var q = from c in custs join o in ords on o.CustomerID equals c.ID into co join a in addrs on a.AddressID equals o.BillingID let ordTotal = co.Sum(o => o.TotalCost) where (a.State == "WA" && a.Country == "United States") || ordTotal >= $25.00 orderby ordTotal descending group new {c.LastName,c.FirstName,ordTotal} by a.State;
11
Additional Query Operators © 2007, Microsoft, Corp. All rights reserved.11 Some have no syntactic representation and must be accessed w/ library calls: ForAll(A, a) : invoke side effecting operation a(x) for each element x in A Concat(A, B) : linearly concatenate the data inputs A and B Zip(A, B) : combine two inputs A and B into pairs by overlaying data Reverse(A) : reverse the ordering of elements in vector A Range(x, y) : generate a stream representing the range [x, y) Set operators: Distinct(A), Union(A, B), Intersect(A, B) Reductions (a.k.a. aggregations, folds): Aggregate(A, binOp), Count(A), Sum(A), Min(A), Max(A), Average(A), EqualAll(A, B), Any(A, p), All(A, p), Contains(A, e) Some have no syntactic representation and must be accessed w/ library calls: ForAll(A, a) : invoke side effecting operation a(x) for each element x in A Concat(A, B) : linearly concatenate the data inputs A and B Zip(A, B) : combine two inputs A and B into pairs by overlaying data Reverse(A) : reverse the ordering of elements in vector A Range(x, y) : generate a stream representing the range [x, y) Set operators: Distinct(A), Union(A, B), Intersect(A, B) Reductions (a.k.a. aggregations, folds): Aggregate(A, binOp), Count(A), Sum(A), Min(A), Max(A), Average(A), EqualAll(A, B), Any(A, p), All(A, p), Contains(A, e)
12
Runtime: Parallel Execution © 2007, Microsoft, Corp. All rights reserved.12
13
Operator Parallelism Intra-operator, i.e. partitioning: Input to a single operator is “split” into p pieces and run in parallel Adjacent and nested operators can enjoy fusion Good temporal locality of data – each datum “belongs” to a partition Inter-operator, i.e. pipelining Operators run concurrently with respect to one another Can avoid “data skew”, i.e. imbalanced partitions, as can occur w/ partitioning Typically incurs more synchronization overhead and yields considerably worse locality than intra-operator parallelism, so is less attractive Partitioning is preferred unless there is no other choice For example, sometimes the programmer wants a single-CPU view, e.g.: foreach (x in q) a(x) Consumption action a for might be written to assume no parallelism Bad if a(x) costs more than the element production latency Otherwise, parallel tasks just eat up memory, eventually stopping when the bounded buffer fills But a(x) can be parallel too Intra-operator, i.e. partitioning: Input to a single operator is “split” into p pieces and run in parallel Adjacent and nested operators can enjoy fusion Good temporal locality of data – each datum “belongs” to a partition Inter-operator, i.e. pipelining Operators run concurrently with respect to one another Can avoid “data skew”, i.e. imbalanced partitions, as can occur w/ partitioning Typically incurs more synchronization overhead and yields considerably worse locality than intra-operator parallelism, so is less attractive Partitioning is preferred unless there is no other choice For example, sometimes the programmer wants a single-CPU view, e.g.: foreach (x in q) a(x) Consumption action a for might be written to assume no parallelism Bad if a(x) costs more than the element production latency Otherwise, parallel tasks just eat up memory, eventually stopping when the bounded buffer fills But a(x) can be parallel too © 2007, Microsoft, Corp. All rights reserved.13
14
q = from x in A where p(x) select x 3 ; Intra-operator: Inter-operator: Both composed: q = from x in A where p(x) select x 3 ; Intra-operator: Inter-operator: Both composed: … Thread 4 … … Thread 3 … … Thread 2 … … Thread 1 … Parallelism Illustrations © 2007, Microsoft, Corp. All rights reserved.14 where p(x) select x 3 A where p(x) select x 3 … Thread 2 …… Thread 1 … A where p(x) select x 3 … Thread 2 … … Thread 1 … where p(x) select x 3 A where p(x) select x 3
15
Deciding Parallel Execution Strategy Tree analysis informs decision making: Where to introduce parallelism? And what kind? (partition vs. pipeline) Based on intrinsic query properties and operator costs Data sizes, selectivity (for filter f, what % satisfies the predicate?) Intelligent “guesses”, code analysis, adaptive feedback over time But not just parallelism, higher level optimizations too, e.g. Common sub-expression elimination, e.g. from x in X where p(f(x)) select f(x); Reordering operations to: Decrease cost of query execution, e.g. put a filter before the sort, even if the user wrote it the other way around Achieve better operator fusion, reducing synchronization cost Tree analysis informs decision making: Where to introduce parallelism? And what kind? (partition vs. pipeline) Based on intrinsic query properties and operator costs Data sizes, selectivity (for filter f, what % satisfies the predicate?) Intelligent “guesses”, code analysis, adaptive feedback over time But not just parallelism, higher level optimizations too, e.g. Common sub-expression elimination, e.g. from x in X where p(f(x)) select f(x); Reordering operations to: Decrease cost of query execution, e.g. put a filter before the sort, even if the user wrote it the other way around Achieve better operator fusion, reducing synchronization cost © 2007, Microsoft, Corp. All rights reserved.15
16
Partitioning Techniques Partitioning can be data-source sensitive If a nested query, can fuse existing partitions If an array, calculate strides and contiguous ranges (+spatial locality) If a (possibly infinite) stream, lazily hand out chunks Partitioning can be operator sensitive E.g. equi-joins employ a hashtable to turn an O(nm) “nested join” into O(n+m) Build hash table out of one data source; then probe it for matches Only works if all data elements in data source A with key k are in the same partition as those elements in data source B also with key k We can use “hash partitioning” to accomplish this: for p partitions, calculate k for each element e in A and in B, and then assign to partition based on key, e.g. k.GetHashCode() % p Output of sort: we can fuse, but restrict ordering, ordinal and key based Existing partitions might be repartitioned Can’t “push down” key partitioning information to leaves: types changed during stream data flow, e.g. select operator Nesting: join processing output of another join operator Or just to combat partition skew Partitioning can be data-source sensitive If a nested query, can fuse existing partitions If an array, calculate strides and contiguous ranges (+spatial locality) If a (possibly infinite) stream, lazily hand out chunks Partitioning can be operator sensitive E.g. equi-joins employ a hashtable to turn an O(nm) “nested join” into O(n+m) Build hash table out of one data source; then probe it for matches Only works if all data elements in data source A with key k are in the same partition as those elements in data source B also with key k We can use “hash partitioning” to accomplish this: for p partitions, calculate k for each element e in A and in B, and then assign to partition based on key, e.g. k.GetHashCode() % p Output of sort: we can fuse, but restrict ordering, ordinal and key based Existing partitions might be repartitioned Can’t “push down” key partitioning information to leaves: types changed during stream data flow, e.g. select operator Nesting: join processing output of another join operator Or just to combat partition skew © 2007, Microsoft, Corp. All rights reserved.16
17
Example: Query Nesting and Fusion © 2007, Microsoft, Corp. All rights reserved.17 Nesting queries inside of others is common We can fuse partitions var q1 = from x in A select x*2; var q2 = q1.Sum(); Nesting queries inside of others is common We can fuse partitions var q1 = from x in A select x*2; var q2 = q1.Sum(); select x*2 ++++ ++ ++++ ++ I. Select (alone)2. Sum (alone)3. Select + Sum
18
Execution of Work © 2007, Microsoft, Corp. All rights reserved.18 Windows’ finest granularity of work is a thread Each partition has at most one thread assigned to it, assigned via a gang scheduling//dynamic work stealing-like (a la Cilk) algorithm Tension between creating “just the right number of threads” (static+dynamic adaptivity) versus over partitioning work: would change some things, but maybe for the better Hard to predict things like IO and blocking Developer still has shared memory, can make horrible mistakes, e.g.: int s_x = 0; var q = from x in A where x == s_x++; Analysis can sometimes catch this, but often not (dynamic function invocation, e.g. where x == side_effecting_func(…)) C#, and generally the CLR’s, type system doesn’t support the notion of purity (though some research systems, e.g. Spec#, provide hope) Where is transactional memory when you need it? Windows’ finest granularity of work is a thread Each partition has at most one thread assigned to it, assigned via a gang scheduling//dynamic work stealing-like (a la Cilk) algorithm Tension between creating “just the right number of threads” (static+dynamic adaptivity) versus over partitioning work: would change some things, but maybe for the better Hard to predict things like IO and blocking Developer still has shared memory, can make horrible mistakes, e.g.: int s_x = 0; var q = from x in A where x == s_x++; Analysis can sometimes catch this, but often not (dynamic function invocation, e.g. where x == side_effecting_func(…)) C#, and generally the CLR’s, type system doesn’t support the notion of purity (though some research systems, e.g. Spec#, provide hope) Where is transactional memory when you need it?
19
Some Conclusions & Observations © 2007, Microsoft, Corp. All rights reserved.19 Results have been encouraging: about what you’d expect given prior related research Good performance, few changes required to the serial programming model Given the upcoming public release of LINQ in VS, we hope reach will be good Not a silver bullet – just one tool in a developer’s belt Hard to “catch up” to huge parallelism constants on Windows, particularly given small data inputs and/or inexpensive operators var q = Range(0,100).Sum(); // add up #s [0,100) Also easy to run into memory bottlenecks, possible opportunities for architecture-aware optimizations (we already try to maximize spatial+temporal locality) Costs are hard to get right Too much dynamism in the platform to arrive at a correct # Even if we did, hard to create heuristics that scale well across platforms Too much decomposition, too little, unexpected IO (paging, …), synchronization But in the end: do costs really matter? Or is it better to represent concurrency using a fixed granule and let another scheduling mechanism apply policy (work stealing)? Many queries are candidates for SIMD/vector architectures Targeting other instruction sets (SSEx, GPU) could be profitable Results have been encouraging: about what you’d expect given prior related research Good performance, few changes required to the serial programming model Given the upcoming public release of LINQ in VS, we hope reach will be good Not a silver bullet – just one tool in a developer’s belt Hard to “catch up” to huge parallelism constants on Windows, particularly given small data inputs and/or inexpensive operators var q = Range(0,100).Sum(); // add up #s [0,100) Also easy to run into memory bottlenecks, possible opportunities for architecture-aware optimizations (we already try to maximize spatial+temporal locality) Costs are hard to get right Too much dynamism in the platform to arrive at a correct # Even if we did, hard to create heuristics that scale well across platforms Too much decomposition, too little, unexpected IO (paging, …), synchronization But in the end: do costs really matter? Or is it better to represent concurrency using a fixed granule and let another scheduling mechanism apply policy (work stealing)? Many queries are candidates for SIMD/vector architectures Targeting other instruction sets (SSEx, GPU) could be profitable
20
The End © 2007, Microsoft, Corp. All rights reserved.20
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.