Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by: Omar Alqahtani Spring 2016. Authors: Publication:  ICDE 2015 Type:  Research Paper 2.

Similar presentations


Presentation on theme: "Presented by: Omar Alqahtani Spring 2016. Authors: Publication:  ICDE 2015 Type:  Research Paper 2."— Presentation transcript:

1 Presented by: Omar Alqahtani Spring 2016

2 Authors: Publication:  ICDE 2015 Type:  Research Paper 2

3  Data Exploration platforms assist users to discover interesting objects within large volumes of scientific and business data.  Similar to top-k and skyline, but what is it?  Data diversification is to extract from a query result, a small set of non-redundant points that are diverse among themselves according to some distance measure.  Current approach is process-first-diversity-next. Drawback?  Motivation: the need to efficiently provide users with effective insights during data exploration. 3

4  Progressive Data Diversification (pDiverse) scheme.  The main idea is to detect and prune those data points in the query result that cannot be included in the final diverse set.  By utilizing partial distance computation, will reduce the amount of CPU and I/O Incurred during query diversification.  Also,  Progressive Greedy (pGreedy) heuristic, which forms the core of our pDiverse scheme.  Extending pGreedy to work with column-store.  Integrated model, which combined range query with the diversification.  Optimizing pDiverse by incorporating novel techniques for ordering of dimensions and approximation of diversity 4

5  Mostly, there are three categories of diversification: Content based -- Novelty based -- Semantic coverage based  Formal definition:  It is NP-Hard problem, so, greedy-based heuristics are the ones most widely used. 5

6 Presented by: Omar Alqahtani Spring 2016

7 Authors: Publication:  ICDE 2015 Type:  Research Paper 7

8  Query execution performance of database systems depends heavily on query optimization decisions.  Best possible plan, mostly, needs cost model to estimate performance of viable alternatives.  Cost models rely on statistics about the data. But?  As a result, commercial DBMS often assume uniform data distributions and attribute value independence, which is in reality hardly the case.  Suboptimal plans  Subpar performance 8

9 9

10  They define robustness in the context of query processing as: The ability of a system to efficiently cope with unexpected and adverse conditions, and deliver near-optimal performance for all query inputs. 10

11 Based on:  Understanding of the data distributions is a continuous process.  Also, distribution may develop throughout the execution of a query plan.  Since one execution strategy might not be optimal over the entire data set. They propose:  A new class of morphable operators that continuously and seamlessly adjust their execution strategy as the understanding of the data evolves.  Smooth Scan Operator that morphs between an index look-up and a full table scan, which:  achieves near-optimal performance regardless of the operator’s selectivity  obliviously to the existing data statistics. 11

12  Some works focus on dealing with the problem at the optimizer level, but:  in dynamic environments, they could bring only partial benefits as the environment keeps changing even after optimization.  Orthogonal approaches on run-time adaptivity, however:  They are lacking the flexibility at the level of access paths.  remain sensitive to the accuracy of statistics. 12

13 Presented by: Zohreh Raghebi Spring 2016

14 Authors: Publication:  ICDE 2015 Type:  Research Paper 14

15  Rapid growth of event based social network services  Meetup and Plancast  Connects people through events  Allow users to form online groups  Publish and announce events to other group members 15

16  1) Which groups would a particular user like to join?  2) Which tags might a group choose when constructing its profiles?  3) Who will attend an upcoming event?  To design recommendation systems for three specific tasks 16 groups to users Tags to groups Events to users

17  [1] Proposed a factorization model  To exploits social and location features for event-based group recommendation  [2] Introduced a topic model  To solve the tag recommendation problem for groups  [3] Used a simple graph-based approach  To recommend users for an event  Performs the information diffusion over user network 17 Lack of general solution

18  To model the interactions between multiple entities  Users, Events, Groups, and Tags  Analyzing the data to extract some useful temporal patterns of user behaviors  Convert the recommendation problem into a node proximity calculation problem 18

19  To evaluate the node proximity  Heterogeneous graph contains multiple types of entities  Influence each other via different types of interactions  To balance the importance of these influences for proximity calculation  The importance of them may vary from one recommendation problem to another 19

20  Random Walk with Restart (RWR) to calculate node proximity for recommendations  RWR is developed on univariate Markov chain for homogeneous graphs  As a generalization, multivariate Markov chain (MMC)  To model the random walk process in a heterogeneous graph  MMC is able to explicitly model the influences between different entities 20

21  Existing MMC based methods need to manually set the influence weights between different types of entities  Multiple types of entities exist  Learning scheme tries to fid the optimal set of weights 21

22  A general model, to handle multiple recommendation problems in an event-based social network  To avoid the issue of manual parameter assignment  Propose a learning framework to find appropriate parameters for the model  The values of learned parameters indicate the importance of different types of entities in different recommendation tasks  Better understandings on user behavior in an event-based social network 22

23 Presented by: Zohreh Raghebi Spring 2016

24 Authors: Publication:  ICDE 2015 Type:  Research Paper 24

25  Knowledge is represented as a graph  There is uncertainty in the presence of each edge in the graph  Uncertain graphs have been used extensively  Communication networks  Social networks  Protein interaction networks 25

26  Identification of dense substructures within a graph  Clique, a completely connected subgraph  Maximal clique, is a clique that is not contained within any other clique  Enumerating all maximal cliques  Finding overlapping communities from social networks  Finding overlapping multiple protein complexes  Analysis of email networks 26

27  Clique in an uncertain graph  A set of vertices that has a high probability of being a completely connected subgraph  Applications  Finding sets of vertices help to unearth robust communities within an uncertain graph  A group of proteins such that it is likely that each protein interacts with each other protein 27

28  A set of vertices U is an α-maximal clique if U is a clique with probability at least α  There does not exist a vertex set S such that U ⊂ S and S is a clique with probability at least α  When α = 1, we have the notion of a maximal clique in a deterministic graph 28

29  The problem of finding reliable subgraphs  Finding subgraphs that are connected with a high probability  In contrast, interested in finding subgraphs that are not just connected,  Fully connected with a high probability  Enumerating the k cliques with the highest probability of existence  Focus on enumerating all α-maximal cliques in a graph 29

30  f(n, α) be the maximum number of α-maximal cliques  Proofs…………… 30

31  Using depth-first-search (DFS) with backtracking  Starts with a set of vertices C that is an α-clique  Incrementally adds vertices to C  While retaining the property of C being an α-clique  The algorithm backtracks to explore other possible vertices  until all possible search paths have been explored 31

32  First, To save the effort of needing to check if a new vertex v can be used to extend C  Consider only those vertices that are already connected to every vertex within C  This leads us to incrementally track vertices that can still be used to extend C 32

33  Second, not all vertices that extend C into a clique preserve the property of C being an α-clique.  Adding a new vertex v to C decreases the clique probability  By a factor equal to the product of the edge probabilities between v and every vertex in C.  Incrementally maintaining this factor for each vertex v 33

34 34


Download ppt "Presented by: Omar Alqahtani Spring 2016. Authors: Publication:  ICDE 2015 Type:  Research Paper 2."

Similar presentations


Ads by Google