Presentation is loading. Please wait.

Presentation is loading. Please wait.

Querying, Exploration, and Analytics of Entity Data Graphs

Similar presentations


Presentation on theme: "Querying, Exploration, and Analytics of Entity Data Graphs"— Presentation transcript:

1 Querying, Exploration, and Analytics of Entity Data Graphs
Chengkai Li Dept. of Computer Science and Engineering University of Texas at Arlington Nanjing University, Aug. 20th, 2012

2 Chengkai Li Assistant Professor
Department of Computer Science and Engineering University of Texas at Arlington Research Areas Databases, Web Data Management, Data Mining, Information Retrieval Specific Topics computational journalism, database exploration, database testing, entity search and query, OLAP and data warehousing, query processing and optimization, ranking and skyline queries, and Web search/mining/integration

3 Dallas-Fort Worth Metroplex, Texas

4 The City of Arlington, Texas

5 The University of Texas at Arlington

6 Ph.D. Work University of Illinois at Urbana-Champaign, 2007
(advisor: Kevin Chang) Ranking and Top-k Queries RankSQL system and ranking algebra ranking aggregates integration of ranking and clustering Deep Web Data Integration XML Query Processing

7 The Innovative Database and Information Systems Research (IDIR) Lab
Faculty Chengkai Li PhD students Naeemul Hassan, Nandish Jayaram, Afroza Sultana, Ning Yan, Gensheng Zhang MS students Mahesh Gupta, Jijo Philip BS students Raju Karki, Feifan Meng, Khuong Nguyen Alumni Mahbubur Rahman, Xiaonan Li, Avinash Bharadwaj (MS, 2011, Copper Labs), Aditya Telang (Ph.D., 2011, co-advised with Sharma Chakravarthy, IBM Research India), Jared Ashman (MS, 2010, Ambit Energy), Ebrahim Cutlerywala (MS, 2010, Google), Quazi (Sunny) Hasan (MS, 2010, Dematic), Angus Helm (BS, 2010), Rakesh Ramegowda (MS, 2010), Aakash Tuli (BS, 2010), Muhammad Safiullah (MS, 2008, Microsoft) Collaborators Pankaj K. Agarwal (Duke), Sharma Chakravarthy (UTA), Sarah Cohen (Duke), Christoph Csallner (UTA), Gautam Das (UTA), Chris Ding (UTA), Ramez Elmasri (UTA), Leonidas Fegaras (UTA), Bin He (IBM Almaden), Ping Luo (HP Labs), Min Wang (HP Labs), Xifeng Yan (UCSB), Jun Yang (Duke), Cong Yu (Google), Nan Zhang (George Washington U.)

8 Projects WebEQ: Querying and Exploration of Web Text
Entity-Relationship Queries Faceted Search Usability of Query Systems over Entity Data graphs Graph Query by Example Faceted Exploration of Data Graphs Graph Query Algebra Computational Journalism Prominent Streak Discovery One-of-the-Few Objects Significant Fact Finding Database Testing Dynamic Symbolic Testing of Database Applications Testing MapReduce Programs Misc. Skyline Groups Set Queries Ranking in Web Databases

9 WebEQ: Entity-Centric Querying and Exploration of Web Text

10 Entity-Relationship Queries (ERQ) http://idir.uta.edu/erq

11

12

13

14

15

16 Facetedpedia http://idir.uta.edu/facetedpedia

17

18

19

20

21

22

23

24

25 Demo

26 Graph Query by Example

27 Afroza Sultana, Quazi Hasan, Ashis Biswas, Soumyava Das, Habibur Rahman, Chris Ding, Chengkai Li: Infobox Suggestion for Wikipedia Entities. CIKM 2012, poster paper. Xiaonan Li, Chengkai Li, Cong Yu: Entity-Relationship Queries over Wikipedia. ACM Transactions on Intelligent Systems and Technology (TIST), 2012. Chengkai Li, Ning Yan, Senjuti Basu Roy, Lekhendro Lisham, Gautam Das: Facetedpedia: dynamic generation of query-dependent faceted interfaces for wikipedia. WWW 2010: Ning Yan, Chengkai Li, Senjuti B. Roy, Rakesh Ramegowda, Gautam Das: Facetedpedia: Enabling Query-Dependent Faceted Search for Wikipedia. CIKM 2010: Demonstration Description. Xiaonan Li, Chengkai Li, Cong Yu: EntityEngine: Answering Entity-Relationship Queries using Shallow Semantics. CIKM 2010: Demonstration Description.

28 Demandata: Data-Driven Computational Investigative Journalism

29 Demandata: Data-Driven Computational Investigative Journalism
Explore the young field of computational journalism Build sites/apps/systems with societal impact. Apply and invent techniques of database systems, text/data mining, Web database, visualization, social computational systems, cloud computing 60 99 "When I was mayor of New York City, I encouraged adoptions. Adoptions went up 65 to 70 percent..." … the Kings became just the third of those 96,000-plus teams to have a game in which they produced both so few points (60) and such a low shooting percentage (25.6%)… We will build systems to automatically discover such fascinating facts and narrate them. We will build systems to automatically check if his statement is reliable.

30 Prominent Streak Discovery

31 Prominent streaks “This month the Chinese capital has experienced 10 days with a maximum temperature in around 35 degrees Celsius – the most for the month of July in a decade.” “The Nikkei 225 closed below for the 12th consecutive week, the longest such streak since June 2009.” “He (LeBron James) scored 35 or more points in nine consecutive games and joined Michael Jordan and Kobe Bryant as the only players since 1970 to accomplish the feat.” “Deron williams was the first player in NBA history that achieved 20+ points 10+ assists in the first 5 games of a series.”

32 Prominent Streaks

33 Data Sequences

34 Algorithm Execution Time

35 Example Prominent Streaks

36 Example Prominent Streaks
“In Melbourne, Australia, during the years between 1981 and 1990, the weather had been pleasant. There had been more than two thousand days with minimum temperature above the zero point, and the streak was not ending. (We do not have data beyond 1990.) The longest streak during which the temperature hit above 35 degrees Celsius is six days. It was in the summer of the year 1981.” “More than half of the prominent streaks we found in the traffic data of the Lady Gaga Wikipedia page were around September 12th, when she became a big winner in the MTV Video Music Awards (VMA) During that time, the page had been visited by at least 2000 people in every hour for almost four days.”

37 One-of-the-Few Objects

38 “One of the Few” Claims Sports: Karl Malone is ONE OF THE ONLY TWO players in NBA history with 25,000 points, 12,000 rebounds, and 5,000 assists in one’s career Politics: He is ONE OF THE ONLY THREE candidates who have raised more than 25% from PAC contributions and 25% from self-financing Do these claims really hold water? How do we find truly interesting claims or individuals?

39 One-of-the-Few => k-Skyband
He is ONE OF THE ONLY TWO players with 25,000 points, 12,000 rebounds, and 5,000 assists

40 One-of-the-Few => k-Skyband
1-skyband 2-skyband

41 may sound impressive because 𝑘=1, but there can be many objects like 𝑋

42 Chamberlain Jordan Baylor Robertson Pettit Abdul-Jabbar Bird Johnson
Chamberlain Jordan Baylor Robertson Pettit Abdul-Jabbar Bird Johnson Stockton James

43

44 You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, Cong Yu: On “One of the Few” Objects. KDD 2012.
Xiao Jiang, Chengkai Li, Ping Luo, Min Wang, Yong Yu: Prominent Streak Discovery in Sequence Data. KDD 2011, pages Sarah Cohen, Chengkai Li, Jun Yang, Cong Yu: Computational Journalism: A Call to Arms to Database Researchers. CIDR 2011, pages

45 Skyline Groups

46 Chengkai Li, Nan Zhang, Naeemul Hassan, Sundaresan Rajasekaran, Gautam Das: On Skyline Groups. CIKM 2012, short paper.


Download ppt "Querying, Exploration, and Analytics of Entity Data Graphs"

Similar presentations


Ads by Google