Download presentation

Presentation is loading. Please wait.

Published byLuis Wade Modified over 3 years ago

1
SoQL – A Language for Querying and Creating Data in Social Networks Royi Ronen and Oded Shmueli Technion – Israel Institute of Technology March 29 th, 2009 M3SN, Shenghai, China

2
Introduction As social networks become popular A lot of data Many participant Many connections Sizable participant record Proliferation to business and organizational cultures Many querying scenarios which can benefit from a domain-specific language SoQL is proposed as a step in this direction

3
Example Bob Charlie Alice Dave Eve Gloria Frank 4Manageralice@hal.comHALAlice 3Managerbob@acme.netACMEBob 2Engineercha@cia.govCIACharlie 5Teacherdave@mtv.comMTVDave 6Scientisteve@acme.netACMEEve 7Technicianfr@hal.comHALFrank 6Producerglor@abc.orgABCGloria 1 5 2 7 3 4 9 6 8 0.71 0.62 0.43 4 0.35 0.96 0.857 8 0.59 T N (name,company,e-mail,position,experience) T F (id,weight)

4
Bob s Information Needs Bob works for ACME, and is looking for a job in HAL Bob is looking for a path which connects him to a manager in HAL, which in addition is at most 4 nodes long, and does not have any participant, except for Bob, working for ACME Results are to be ordered by the multiplication of weights along the path, excluding the first edge Higher quality social paths

5
Bob s query SELECT COUNT(PATH.nodes.*), PATH FROM PATH (Bob TO X AS P1 TO Y AS P2) WHERE Y.company = 'HAL' and Y.position = 'manager' and ATMOST 0 IN P2.nodes SATISFY (company='ACME') and COUNT(P1.nodes.*) = 2 and COUNT(PATH.nodes.*) <= 4 ORDER BY MULT(P2.edges.weight) The Path Path Predicates Aggregation Path predicates Conditions on attributes P2 BobXY P1

6
Result 4(Bob, Dave, Gloria, Alice) 3(Bob, Charlie, Alice) 4(Bob, Dave, Eve, Alice) Multiplication values are: 0.765, 0.3, 0.16

7
Model Undirected graph Reciprocal friends model Nodes and edges have attributes New Data Types Path – An ordered set of distinct nodes, every two successive nodes are connected Group – A set of nodes

8
Model Results are finite Social networks are constantly growing But finite at any point

9
Aggregation over Path/Group Aggregation over path/group is possible E.g., the number of nodes in path P1 SELECT COUNT(*) FROM P1.nodes Or, as in the previous example: MULT(P2.edges.weight)

10
Path Predicates ALL … SATISFY (condition) ATMOST n ATLEAST n ALL EXCEPT UPTO n MAJORITY

11
Another information need Bob would like to find a group such that The group contains Bob and three others There exists a path of up to three edges from Bob to each of the three There exists a path of up to two edges between every two of the three All three have experience >= 5

12
SELECT FROM GROUP SELECT GROUP FROM GROUP (Bob AS G1, DISTINCT(X,Y,Z) AS G2) WITH PATH (Bob TO X AS P1), PATH (Bob TO Y AS P2), PATH (Bob TO Z AS P3) WHERE COUNT(P1.edges.*)<=3 and COUNT(P2.edges.*)<=3 and COUNT(P3.edges.*)<=3 and ALL IN G2.nodes SATISFY (experience>=5) and ALL SUBGROUPS(U,V) IN G2 SATISFY (PATH(U TO V AS P4) COUNT(P4.edges.*)<=2)) Group with Paths Aggregation on paths Group Predicate Subgroups IN and COUNT(GROUP.nodes.*)<=5

13
Group Predicates Group predicates refer to either nodes in a group or paths involving members of the group When referring to nodes, operators are the same as for paths ALL IN G2.nodes SATISFY (experience>=5) When referring to paths, as in ALL SUBGROUPS(U,V) IN G2 SATISFY(PATH(U TO V AS P4) COUNT(P4.edges.*)<=2) operators are: ALL SUBGROUPS, ATLEAST n SUBGROUPS, ALL EXCEPT UPTO n SUBGROUPS, MAJORITY SUBGROUPS

14
CONNECT Let R be a one-column relation of paths The paths are used for an automated process of referral intended to create a connection to the last node in the path CONNECT USING PATH FROM R WHERE TIMEOUT=36, ATTEMPTS=5, PARALLEL=2, HISTORY=true

15
CONNECT Let R be a one-column relation of groups An automated process will attempt Form a group, like, e.g., Facebook, or Create an edge between each pair in the group CONNECT GROUP FROM R WHERE TIMEOUT=48, ATTEMPTS=1, PARALLEL=1

16
Implementation Issues Path/Group sizes are not necessarily predefined or known a priori Deployment parameters needed Maximum tuples in a result (Google s 1k) Maximal length of any path Maximal size of any group Time Limit

17
Finding paths Top-k self joins can be used to avoid large intermediate results In, e.g., distributed data, random walks can be used to extract candidates for paths in the result At any point, if the path can not satisfy the query, the walk aborts Many walking agents can provide a good approximation

18
Conclusions SoQL is a domain-specific, SQL-like query language for the social networks domain Creation of data is possible using the Path and the Group data types is possible Future work More expressive predicates, e.g., disjointness of two paths Implementation Advanced, optimized evaluation techniques for centralized and distributed environments

19
Thank You

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google