SFU, CMPT 741, Fall 2009, Martin Ester 418 Outlook Outline Trends in KDD research Graph mining and social network analysis Recommender systems Information.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Web Mining.
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Web Mining Research: A Survey
Introduction Contents of this Chapter
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Recommender Systems; Social Information Filtering.
Recommender systems Ram Akella November 26 th 2008.
CMPT 884, SFU, Martin Ester, Special Topics in Database Systems Martin Ester Simon Fraser University School of Computing Science CMPT 884 Spring.
Data Mining – Intro.
Overview of Web Data Mining and Applications Part I
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Item-based Collaborative Filtering Recommendation Algorithms
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Database and Data-Intensive Systems. Data-Intensive Systems From monolithic architectures to diverse systems Dedicated/specialized systems, column stores.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Chapter 13 Genetic Algorithms. 2 Data Mining Techniques So Far… Chapter 5 – Statistics Chapter 6 – Decision Trees Chapter 7 – Neural Networks Chapter.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
CS525: Big Data Analytics Machine Learning on Hadoop Fall 2013 Elke A. Rundensteiner 1.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada.
Chapter 1 Introduction to Data Mining
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining By Dave Maung.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County.
Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Community-Based Link Prediction/Recommendation in the Bipartite Network of BoardGameGeek.com Brett Boge CS 765 University of Nevada, Reno.
Information Design Trends Unit Five: Delivery Channels Lecture 2: Portals and Personalization Part 2.
Chapter 7 K NOWLEDGE R EPRESENTATION, O NTOLOGICAL E NGINEERING, AND T OPIC M APS L EO O BRST AND H OWARD L IU.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
User Modeling and Recommender Systems: recommendation algorithms
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Cohesive Subgraph Computation over Large Graphs
Data Mining – Intro.
What Is Cluster Analysis?
Automatic cLasification d
Introduction C.Eng 714 Spring 2010.
Data Mining Modified from
CS7280: Special Topics in Data Mining Information/Social Networks
Data Warehousing and Data Mining
Data Mining: Introduction
CSE591: Data Mining by H. Liu
Presentation transcript:

SFU, CMPT 741, Fall 2009, Martin Ester 418 Outlook Outline Trends in KDD research Graph mining and social network analysis Recommender systems Information extraction

SFU, CMPT 741, Fall 2009, Martin Ester 419 Trends in KDD Research KDD 2000 Conference New Data Mining Algorithms Efficiency and Scalability of Data Mining Algorithms Interactive Data Exploration Visualization Constraints and Evaluation in the KDD Process

SFU, CMPT 741, Fall 2009, Martin Ester 420 Trends in KDD Research KDD 2002 Conference Statistical Methods Frequent Patterns Streams and Time Series Visualization Web Search and Navigation Text and Web Page Classification Intrusion and Privacy Applications

SFU, CMPT 741, Fall 2009, Martin Ester 421 Trends in KDD Research KDD 2004 Conference Frequent Patterns / Association Rules Clustering Mining Spatio-Temporal Data Mining Data Streams Dimensionality Reduction Privacy-Preserving Data Mining Mining Biological Data Applications (Web, biological data, security,...)

SFU, CMPT 741, Fall 2009, Martin Ester 422 Trends in KDD Research KDD 2006 Conference Clustering Classification / supervised ML Privacy Web / Graph Mining Web / Text Mining Frequent Pattern Mining Structured Data

SFU, CMPT 741, Fall 2009, Martin Ester 423 Trends in KDD Research KDD 2008 Conference Text Mining Data Integration Social Networks Graph Mining Distance Functions and Metric Learning Active and Semi-supervised Learning Pattern Mining Collaborative Filtering

SFU, CMPT 741, Fall 2009, Martin Ester 424 Trends in KDD Research Some Hot Topics Social Networks THE hot topic of KDD 08  topic of the only panel Graph mining Text mining and information extraction / integration Collaborative Filtering more general, recommender systems  $1M NetFlix prize

SFU, CMPT 741, Fall 2009, Martin Ester 425 Graph Mining and Social Network Analysis Motivating Applications Social network analysis oWhat communities exist? oHow does information about a new product spread? oWhat customers should be targeted to maximize the profit of a marketing campaign? Analysis of biological networks o What are the functional modules of an organism? o How do biological networks evolve in the course of time? o What protein should be targeted to inhibit some virulent bacteria?

SFU, CMPT 741, Fall 2009, Martin Ester 426 Graph Mining and Social Network Analysis Methods Frequent subgraph mining frequent pattern mining approach Graph clustering e.g., normalized cut, i.e. minimize number of edges between graph components / clusters Graph generative models probabilistic models that generate graphs similar to real graphs / networks

SFU, CMPT 741, Fall 2009, Martin Ester 427 Graph Mining and Social Network Analysis Challenges Complexity of graph algorithms oMany graph mining problems are NP-hard. oReal graphs tend to be extremely large.  need efficient algorithms Attribute data oMany graphs have attributes associated with the nodes. oTransformation into weighted graph looses a lot of information.  need new models / algorithms considering relationship and attribute data

SFU, CMPT 741, Fall 2009, Martin Ester 428 Graph Mining and Social Network Analysis Challenges Dynamics of social networks oSocial networks tend to be very dynamic. oWhat are the stable communities? oWhat are the significant changes over time? Rich types of relationships oMulti-graphs. oHypergraphs. oUncertain graphs.  new data mining problems and algorithms

SFU, CMPT 741, Fall 2009, Martin Ester 429 Recommender Systems Motivating Applications Motivation o The internet provides a flood of information on all kinds of items. o There is a great need for personalized recommendations. o The internet also provides a wealth of item ratings / reviews. Typical applications oMovie recommendation o Product recommendation oKeyword recommendation

SFU, CMPT 741, Fall 2009, Martin Ester 430 Recommender Systems Methods Collaborative filtering o Uses only a database of user – item ratings. o Recommendation based on ratings by users with similar rating patterns. Content-based recommender systems o Uses information about the content of items and / or the properties of users. o Recommends items that have content similar to items liked by user. Trust-based recommender systems oAssume a social network / trust network. Trust can be defined explicitly or implicitly. oRecommendation based on ratings by trusted neighbors.

SFU, CMPT 741, Fall 2009, Martin Ester 431 Recommender Systems Challenges High dimensionality and sparsity of data o The overwhelming majority (> 99%) of user item ratings is unknown. o Recommendation especially hard for cold start users and controversial items.  dimensionality reduction, model based methods, trust-based approach Fraud o Memory-based collaborative filtering can be easily manipulated by adding fraudulent ratings.  trust-based approach more robust to fraud Privacy issues with trust network data o only very few trust networks are public domain

SFU, CMPT 741, Fall 2009, Martin Ester 432 Information Extraction Motivating Applications Importance of unstructured text data o The overwhelming majority (>= 80%) of human generated information is not in structured form, but in unstructured text. Biomedical literature o Contains a wealth of valuable information that cannot be processed / searched automatically. o Extraction of entities and relationships such as proteins and their localizations. Online product reviews o A lot of product „reviews“ available online in community databases or blogs. o Companies want to know what customers think of their products.

SFU, CMPT 741, Fall 2009, Martin Ester 433 Information Extraction Methods Basic NLP methods o Part-of-speech tagging o Lexica, ontologies,... Machine learning methods o Typically, supervised classification. o CRFs and similar methods are state-of-the-art. Bootstrapping approach o Using a small labeled training dataset, find textual extraction patterns. o Using these patterns, extract further entities / relationships and continue.

SFU, CMPT 741, Fall 2009, Martin Ester 434 Information Extraction Challenges Text data is hard to understand o Many of the NLP problems are still essentially unsolved.  relatively simple NLP methods often sufficient for information extraction Portability across domains o Extraction methods need to be portable from one domain to another. o Knowledge engineering approach (domain expert defines rules) is labor-intensive and expensive.  machine learning methods Entity mentions need to be resolved o Information extraction produces strings referencing an entity of a given type. o Without mapping to known real world entities, extracted information is of limited usefulness.  need to integrate extracted information with existing databases

SFU, CMPT 741, Fall 2009, Martin Ester 435 References Graph mining -X Yan & Karsten Borgwardt, "Graph Mining and Graph Kernels", Tutorial KDD 08 -Jure Leskovec and Christos Faloutsos, “Mining Large Graphs: Models, Diffusion and Case Studies”, Tutorial ECML/PKDD 2007 Recommender systems -Joseph Konstan, “Introduction to Recommender Systems”, Tutorial SIGMOD 2008 Information extraction - Eugene Agichtein & Sunita Sarawagi, “Scalable Information Extraction and Integration”, Tutorial KDD 06 - AnHai Doan & Raghu Ramakrishnan & Shiv Vaithyanathan, “Managing Information Extraction”, Tutorial SIGMOD 2006