Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair Eric P. Xing William W. Cohen Ambuj K. Singh, University.

Slides:



Advertisements
Similar presentations
Beyond Streams and Graphs: Dynamic Tensor Analysis
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
BBM: Bayesian Browsing Model from Petabyte-scale Data Chao Liu, MSR-Redmond Fan Guo, Carnegie Mellon University Christos Faloutsos, Carnegie Mellon University.
Evaluating Search Engine
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Traffic Engineering With Traditional IP Routing Protocols
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Web Mining Research: A Survey
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Web Mining Research: A Survey
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases Fan Guo, Lei Li, Eric Xing, Christos Faloutsos Carnegie Mellon University {fanguo, leili,
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
Fast Random Walk with Restart and Its Applications
Data Mining – Intro.
Oral Defense by Sunny Tang 15 Aug 2003
Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.
A fast identification method for P2P flow based on nodes connection degree LING XING, WEI-WEI ZHENG, JIAN-GUO MA, WEI- DONG MA Apperceiving Computing and.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
From Devices to People: Attribution of Search Activity in Multi-User Settings Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz Microsoft Research,
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Computational Tools for Population Biology Tanya Berger-Wolf, Computer Science, UIC; Daniel Rubenstein, Ecology and Evolutionary Biology, Princeton; Jared.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Yaping Zhu with: Jennifer Rexford (Princeton University) Aman Shaikh and Subhabrata Sen (ATT Research) Route Oracle: Where Have.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
1 Click Chain Model in Web Search Fan Guo Carnegie Mellon University PPT Revised and Presented by Xin Xin.
Kijung Shin Jinhong Jung Lee Sael U Kang
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Facets: Fast Comprehensive Mining of Coevolving High-order Time Series Hanghang TongPing JiYongjie CaiWei FanQing He Joint Work by Presenter:Wei Fan.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Developing GRID Applications GRACE Project
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Data Mining – Intro.
What Is Cluster Analysis?
Evaluation Anisio Lacerda.
NetMine: Mining Tools for Large Graphs
Latent Space Model for Road Networks to Predict Time-Varying Traffic
On Efficient Graph Substructure Selection
Web Mining Department of Computer Science and Engg.
Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz
Click Chain Model in Web Search
Efficient Multiple-Click Models in Web Search
DATA-Intensive systems Department of computer science
Presentation transcript:

Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair Eric P. Xing William W. Cohen Ambuj K. Singh, University of California at Santa Babara

What this talk is about? 2

3

Going Multimedia 4

Beyond Text and Images 5

Thesis Outline Mining M1: MultiAspectForensics M2: QMAS Querying Q1: Click Models Q2: C-DEM Q3: BEFH 6

Thesis Outline Mining M1: MultiAspectForensics M2: QMAS Querying Q1: Click Models Q2: C-DEM Q3: BEFH 7

Mining Multimedia Data (1) Labeling Satellite Imagery 8 Input Output

Mining Multimedia Data (2) Network Traffic Log Analysis 9

Mining Multimedia Data (3) Web Knowledge Base 10

Mining Multimedia Data Data-driven problem solving over multiple modes at a non-trivial scale. 11

Thesis Outline Mining M1: MultiAspectForensics M2: QMAS Querying Q1: Click Models Q2: C-DEM Q3: BEFH 12

Querying Multimedia Data (1) A querying system provides an interface to retrieve records that best match users’ information need. 13

Querying Multimedia Data (1) Here is another example: 14

Querying Multimedia Data (1) May be transformed into a graph search problem 15

Querying Multimedia Data (2) Calibrate ranking from user feedback 16

Querying Multimedia Data (2) Calibrate ranking from user feedback 17

Thesis Outline Mining M1: MultiAspectForensics M2: QMAS Querying Q1: Click Models Q2: C-DEM Q3: BEFH 18

Data Large-Scale Heterogeneous Networks 19 Port IP-source IP-destination 80 (HTTP ) 993 (IMAP)

Goal How can we automatically detect and visualize patterns within a local community of nodes? 20

Preliminary Tensor for high-order data representation ▫3 data modes: source IP, destination IP, port # 21

Approach 22

Data Decomposition The canonical polyadic (CP) decomposition can factor tensor into a sum of rank-1 tensors 23

Data Decomposition A special case is Singular Value Decomposition 24

Attribute Plot 25 How to compute?

Spike Detection Iteratively search for spikes in the histogram plot along each data mode. 26 “ “” ”

Substructure Discovery Focus on part of the data within the spike Categorize into a few subgraph patterns 27

Pattern 1: Generalized Star (1) 28 IP-src’s sending packets to the same IP-dst & the same port Typical client/server system

Pattern 1: Generalized Star (1) 29 A ‘bar’ in a carefully reordered tensor

Pattern 1: Generalized Star (2) 30 Extending along “Port-Number”

Pattern 1: Generalized Star (2) 31 Port scanning or P2P Port numbers used in packets from the same IP-src to the same IP-dst

Pattern 2: Generalized Bipartite-Core (1) 32 A ‘plane’ in a carefully reordered tensor

Pattern 2: Generalized Bipartite-Core (1) 33 IP-src’s sending packets to the same IP-dst’s & the same port Clients talking to a shared server pool

Pattern 2: Generalized Bipartite-Core (2) 34 A ‘plane’ in a carefully reordered tensor

Pattern 2: Generalized Bipartite-Core (2) 35 IP-src’s sending packets over multiple ports to one IP-dst A multi- purpose windows server

M1: MultiAspectForensics Automatically detects novel patterns in heterogenous networks 36

Thesis Outline Mining M1: MultiAspectForensics M2: QMAS Querying Q1: Click Models Q2: C-DEM Q3: BEFH 37

QMAS: Mining Satellite Imagery (1) Low-labor labeling 38 Input Output

QMAS: Mining Satellite Imagery (2) Low-labor labeling Identification of Representatives 39

QMAS: Mining Satellite Imagery (2) Low-labor labeling Identification of Representatives and Outliers 40

QMAS: Mining Satellite Imagery (2) Low-labor labeling Identification of Representatives and Outliers 41

QMAS: Mining Satellite Imagery (3) Low-labor labeling Identification of Representatives and Outliers Linear in time & space 42

Thesis Outline Mining M1: MultiAspectForensics M2: QMAS Querying Q1: Click Models Q2: C-DEM Q3: BEFH 43

Web Search 44

User Clicks as Quality Feedback 45 # of total clicks

Motivation Leverage the signal from click data to improve search ranking. 46

Click Through Rate (CTR) CTR = # of Clicks / # of Impressions 47

Position Bias 48

Relevance of Web Document Relevance = Position 1 49 # Position 1 # Position 1 =

Problem Definition Estimate the relevance of web documents given clicks and their positions. 50

Design Goals / Constraints Scalable: single-pass, easy to parallel. Incremental: real-time updates possible. Accurate: consistent with past and future observations. 51

Approach 52

User Behavior Model 53

Last Clicked Position 54

Empirical Results Click data after pre-processing ▫110K distinct queries, 8.8M query sessions. Training time: <6 mins Online update: ▫Bump impression and click counters ▫No data retention required 55

Empirical Results Higher log-likelihood indicates better quality % accuracy in prediction 2% improvement over ICM, the baseline model

Empirical Results Position-bias visualized 57 Ground Truth DCM

Scaling to Terabytes 265TB data, 1.15B document relevance results, running time on wall clock ~ 3 hours 58

Q1: Click Models A statistical approach to leveraging click data for better ranking aware of position-bias. They are incremental, more accurate than the baseline, scaling to almost petabyte-scale data. 59

Thesis Outline Mining M1: MultiAspectForensics M2: QMAS Querying Q1: Click Models Q2: C-DEM Q3: BEFH 60

Q2: C-DEM A flexible query interface for 3-mode data: images, genes, annotation terms. 61

Q2: C-DEM 62 Images TermsGenes

Q2: C-DEM Solution: random walk with restart on graphs. 63

Thesis Outline Mining M1: MultiAspectForensics M2: QMAS Querying Q1: Click Models Q2: C-DEM Q3: BEFH 64

Q3: BEFH (1) Bayesian exponential family harmonium Deriving topical representations for multimedia corpora (e.g., video snapshots and captions) 65 InputModel

Q3: BEFH (2) Bayesian exponential family harmonium Deriving topical representations for multimedia corpora (e.g., video snapshots and captions) 66 Validation – Synthetic DataValidation – TRECVID Data Better Quality

Thesis Outline Mining M1: MultiAspectForensics M2: QMAS Querying Q1: Click Models Q2: C-DEM Q3: BEFH 67

Conclusion Data-driven research under the theme of pattern mining and similarity querying. 68

Conclusion Data-driven research under the theme of pattern mining and similarity querying. An array of practical tasks addressed: ▫Internet traffic surveillance (M1) 69

Conclusion Data-driven research under the theme of pattern mining and similarity querying. An array of practical tasks addressed: ▫Internet traffic surveillance (M1) ▫Satellite image analysis (M2) 70

Conclusion Data-driven research under the theme of pattern mining and similarity querying. An array of practical tasks addressed: ▫Internet traffic surveillance (M1) ▫Satellite image analysis (M2) ▫Web search (Q1) 71

Conclusion Data-driven research under the theme of pattern mining and similarity querying. An array of practical tasks addressed: ▫Internet traffic surveillance (M1) ▫Satellite image analysis (M2) ▫Web search (Q1) ▫… 72

Thank You! 73

74