Presentation is loading. Please wait.

Presentation is loading. Please wait.

SFU, CMPT 741, Fall 2009, Martin Ester 418 Outlook Outline Trends in KDD research Graph mining and social network analysis Recommender systems Information.

Similar presentations


Presentation on theme: "SFU, CMPT 741, Fall 2009, Martin Ester 418 Outlook Outline Trends in KDD research Graph mining and social network analysis Recommender systems Information."— Presentation transcript:

1 SFU, CMPT 741, Fall 2009, Martin Ester 418 Outlook Outline Trends in KDD research Graph mining and social network analysis Recommender systems Information extraction

2 SFU, CMPT 741, Fall 2009, Martin Ester 419 Trends in KDD Research KDD 2000 Conference New Data Mining Algorithms Efficiency and Scalability of Data Mining Algorithms Interactive Data Exploration Visualization Constraints and Evaluation in the KDD Process

3 SFU, CMPT 741, Fall 2009, Martin Ester 420 Trends in KDD Research KDD 2002 Conference Statistical Methods Frequent Patterns Streams and Time Series Visualization Web Search and Navigation Text and Web Page Classification Intrusion and Privacy Applications

4 SFU, CMPT 741, Fall 2009, Martin Ester 421 Trends in KDD Research KDD 2004 Conference Frequent Patterns / Association Rules Clustering Mining Spatio-Temporal Data Mining Data Streams Dimensionality Reduction Privacy-Preserving Data Mining Mining Biological Data Applications (Web, biological data, security,...)

5 SFU, CMPT 741, Fall 2009, Martin Ester 422 Trends in KDD Research KDD 2006 Conference Clustering Classification / supervised ML Privacy Web / Graph Mining Web / Text Mining Frequent Pattern Mining Structured Data

6 SFU, CMPT 741, Fall 2009, Martin Ester 423 Trends in KDD Research KDD 2008 Conference Text Mining Data Integration Social Networks Graph Mining Distance Functions and Metric Learning Active and Semi-supervised Learning Pattern Mining Collaborative Filtering

7 SFU, CMPT 741, Fall 2009, Martin Ester 424 Trends in KDD Research Some Hot Topics Social Networks THE hot topic of KDD 08  topic of the only panel Graph mining Text mining and information extraction / integration Collaborative Filtering more general, recommender systems  $1M NetFlix prize

8 SFU, CMPT 741, Fall 2009, Martin Ester 425 Graph Mining and Social Network Analysis Motivating Applications Social network analysis oWhat communities exist? oHow does information about a new product spread? oWhat customers should be targeted to maximize the profit of a marketing campaign? Analysis of biological networks o What are the functional modules of an organism? o How do biological networks evolve in the course of time? o What protein should be targeted to inhibit some virulent bacteria?

9 SFU, CMPT 741, Fall 2009, Martin Ester 426 Graph Mining and Social Network Analysis Methods Frequent subgraph mining frequent pattern mining approach Graph clustering e.g., normalized cut, i.e. minimize number of edges between graph components / clusters Graph generative models probabilistic models that generate graphs similar to real graphs / networks

10 SFU, CMPT 741, Fall 2009, Martin Ester 427 Graph Mining and Social Network Analysis Challenges Complexity of graph algorithms oMany graph mining problems are NP-hard. oReal graphs tend to be extremely large.  need efficient algorithms Attribute data oMany graphs have attributes associated with the nodes. oTransformation into weighted graph looses a lot of information.  need new models / algorithms considering relationship and attribute data

11 SFU, CMPT 741, Fall 2009, Martin Ester 428 Graph Mining and Social Network Analysis Challenges Dynamics of social networks oSocial networks tend to be very dynamic. oWhat are the stable communities? oWhat are the significant changes over time? Rich types of relationships oMulti-graphs. oHypergraphs. oUncertain graphs.  new data mining problems and algorithms

12 SFU, CMPT 741, Fall 2009, Martin Ester 429 Recommender Systems Motivating Applications Motivation o The internet provides a flood of information on all kinds of items. o There is a great need for personalized recommendations. o The internet also provides a wealth of item ratings / reviews. Typical applications oMovie recommendation o Product recommendation oKeyword recommendation

13 SFU, CMPT 741, Fall 2009, Martin Ester 430 Recommender Systems Methods Collaborative filtering o Uses only a database of user – item ratings. o Recommendation based on ratings by users with similar rating patterns. Content-based recommender systems o Uses information about the content of items and / or the properties of users. o Recommends items that have content similar to items liked by user. Trust-based recommender systems oAssume a social network / trust network. Trust can be defined explicitly or implicitly. oRecommendation based on ratings by trusted neighbors.

14 SFU, CMPT 741, Fall 2009, Martin Ester 431 Recommender Systems Challenges High dimensionality and sparsity of data o The overwhelming majority (> 99%) of user item ratings is unknown. o Recommendation especially hard for cold start users and controversial items.  dimensionality reduction, model based methods, trust-based approach Fraud o Memory-based collaborative filtering can be easily manipulated by adding fraudulent ratings.  trust-based approach more robust to fraud Privacy issues with trust network data o only very few trust networks are public domain

15 SFU, CMPT 741, Fall 2009, Martin Ester 432 Information Extraction Motivating Applications Importance of unstructured text data o The overwhelming majority (>= 80%) of human generated information is not in structured form, but in unstructured text. Biomedical literature o Contains a wealth of valuable information that cannot be processed / searched automatically. o Extraction of entities and relationships such as proteins and their localizations. Online product reviews o A lot of product „reviews“ available online in community databases or blogs. o Companies want to know what customers think of their products.

16 SFU, CMPT 741, Fall 2009, Martin Ester 433 Information Extraction Methods Basic NLP methods o Part-of-speech tagging o Lexica, ontologies,... Machine learning methods o Typically, supervised classification. o CRFs and similar methods are state-of-the-art. Bootstrapping approach o Using a small labeled training dataset, find textual extraction patterns. o Using these patterns, extract further entities / relationships and continue.

17 SFU, CMPT 741, Fall 2009, Martin Ester 434 Information Extraction Challenges Text data is hard to understand o Many of the NLP problems are still essentially unsolved.  relatively simple NLP methods often sufficient for information extraction Portability across domains o Extraction methods need to be portable from one domain to another. o Knowledge engineering approach (domain expert defines rules) is labor-intensive and expensive.  machine learning methods Entity mentions need to be resolved o Information extraction produces strings referencing an entity of a given type. o Without mapping to known real world entities, extracted information is of limited usefulness.  need to integrate extracted information with existing databases

18 SFU, CMPT 741, Fall 2009, Martin Ester 435 References Graph mining -X Yan & Karsten Borgwardt, "Graph Mining and Graph Kernels", Tutorial KDD 08 -Jure Leskovec and Christos Faloutsos, “Mining Large Graphs: Models, Diffusion and Case Studies”, Tutorial ECML/PKDD 2007 Recommender systems -Joseph Konstan, “Introduction to Recommender Systems”, Tutorial SIGMOD 2008 Information extraction - Eugene Agichtein & Sunita Sarawagi, “Scalable Information Extraction and Integration”, Tutorial KDD 06 - AnHai Doan & Raghu Ramakrishnan & Shiv Vaithyanathan, “Managing Information Extraction”, Tutorial SIGMOD 2006


Download ppt "SFU, CMPT 741, Fall 2009, Martin Ester 418 Outlook Outline Trends in KDD research Graph mining and social network analysis Recommender systems Information."

Similar presentations


Ads by Google