Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for.

Slides:



Advertisements
Similar presentations
A probabilistic model for retrospective news event detection
Advertisements

Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.
CO-AUTHOR RELATIONSHIP PREDICTION IN HETEROGENEOUS BIBLIOGRAPHIC NETWORKS Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han 1.
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
An Overview of Machine Learning
Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1 University of Illinois, IBM TJ Watson Debapriya Basu.
On Community Outliers and their Efficient Detection in Information Networks Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
08/22/2004 MRDM 2004 Workshop 1 Link Mining Lise Getoor University of Maryland, College Park joint work with Indrajit Bhattacharya, Qing Lu and Prithviraj.
Scalable Text Mining with Sparse Generative Models
Example Data Sets Prior Research Join related objects to form independent compound objects, cluster normally (Yin et al., 2005). Use attribute-based distance.
Data Mining – Intro.
Overview of Web Data Mining and Applications Part I
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Webpage Understanding: an Integrated Approach
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Structural.
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Structural.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( cDNA.
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
Computing & Information Sciences Kansas State University Laboratory for Knowledge Discovery in Databases PhD Research Proficiency Exam Jing.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
LEAP-KMC Workshop 2006 Visualization of KMC Simulation Data and Evolutionary Computation: The LEAP Infrastructure and Content Management System William.
Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.
Database collection evaluation An application of evaluative methods S519.
Department of Electrical Engineering and Computer Science Kunpeng Zhang, Yu Cheng, Yusheng Xie, Doug Downey, Ankit Agrawal, Alok Choudhary {kzh980,ych133,
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
2005/09/13 A Probabilistic Model for Retrospective News Event Detection Zhiwei Li, Bin Wang*, Mingjing Li, Wei-Ying Ma University of Science and Technology.
Organizing Structured Web Sources by Query Schemas: A Clustering Approach Bin He Joint work with: Tao Tao, Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Personality Classification: Computational Intelligence in Psychology and Social Networks A. Kartelj, School of Mathematics, Belgrade V. Filipovic, School.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Warren Shen, Xin Li, AnHai Doan Database & AI Groups University of Illinois, Urbana Constraint-Based Entity Matching.
MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS
Online Multiscale Dynamic Topic Models
Automatic Video Shot Detection from MPEG Bit Stream
The topic discovery models
Jiawei Han Department of Computer Science
Data Mining: Concepts and Techniques Course Outline
The topic discovery models
Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.
CS7280: Special Topics in Data Mining Information/Social Networks
Community Distribution Outliers in Heterogeneous Information Networks
The topic discovery models
Dept. of Computer Science University of Liverpool
Presentation transcript:

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases 3 rd IJCAI Workshop on HINA Saturday, 25 July 2015 William H. Hsu Laboratory for Knowledge Discovery in Databases, Kansas State University Acknowledgements Northeastern University: Yizhou Sun Stony Brook University: Leman Akoglu Illinois: Jiawei Han Kansas State: Wesam Elshamy. Surya Teja Kallumadi Link-Augmented Dynamic Topic Modeling

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Based on NLP Group NER Toolkit © Stanford University Simile © Massachusetts Institute of Technology Google Maps © Tele Atlas, Inc. and Google, Inc. Previous Work [1]: From Text to Spatiotemporal Events

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Previous Work [2]: Timeline Formation Elshamy (2012)

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Previous Work [3]: Simultaneous Topic Enumeration & Tracking (STET) Adapted from Elshamy (2012) Time t: 3 extant topicsTime t + k: 2 extant topics

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Topic Modeling: Static (STM) vs. Dynamic (DTM)  Generative Model Concepts  Using Time in DTM Heterogeneous Information Network Analysis (HINA) From Link Analysis to Link-Augmented DTM  Community Detection: Gibson et al., Lu & Getoor  From RankClus to NetClus: Sun et al.  LIMTopic (Duan et al.) & Other Approaches Outline

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Topic Modeling [1]: Basic Task (Static/Atemporal) Elshamy (2012)

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Topic Modeling: Static (STM) vs. Dynamic (DTM)  Generative Model Concepts  Using Time in DTM Heterogeneous Information Network Analysis (HINA) From Link Analysis to Link-Augmented DTM  Community Detection: Gibson et al., Lu & Getoor  From RankClus to NetClus: Sun et al.  LIMTopic (Duan et al.) & Other Approaches Outline

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Topic Modeling [2]: Understanding Plate Notation Adapted from Elshamy (2012)

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Topic Modeling: Static (STM) vs. Dynamic (DTM)  Generative Model Concepts  Using Time in DTM Heterogeneous Information Network Analysis (HINA) From Link Analysis to Link-Augmented DTM  Community Detection: Gibson et al., Lu & Getoor  From RankClus to NetClus: Sun et al.  LIMTopic (Duan et al.) & Other Approaches Outline

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Observable Transition: Topic Arrival (Detected by Topic Model) Hidden State: Topic Activity Level Problem: Topic Drift (Similar to Concept Drift) Dynamic Topic Modeling: HMM for Topic Detection & Tracking (TDT) Adapted from Elshamy (2012) Hidden Markov Model (HMM)

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Continuous Time vs. Variable Number of Topics Elshamy (2012) Goal: Hybrid Model Continuous Time, Variable Number of Topics cDTM Continuous Time, Fixed Number of Topics oHDP Discrete Time, Variable Number of Topics State of the Field

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Hierarchical Dirichlet Process (HDP) Model Fan et al. (2013)

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Retrieving Documents Query Document Short Text Query Author Query Search Interface Word-Topic Inverted Index Document- Topic Inverted Index Corpus representation from topic model Ranked Results

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Information Retrieval Task [1]: Overview

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Information Retrieval Task [2]: Static Setting

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Information Retrieval Task [3]: Dynamic Setting

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Topic Modeling: Static (STM) vs. Dynamic (DTM)  Generative Model Concepts  Using Time in DTM Heterogeneous Information Network Analysis (HINA) From Link Analysis to Link-Augmented DTM  Community Detection: Gibson et al., Lu & Getoor  From RankClus to NetClus: Sun et al.  LIMTopic (Duan et al.) & Other Approaches Outline

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Heterogeneous Information Network Analysis (HINA) Real-world data: Multiple object types and/or multiple link types VenuePaperAuthor DBLP Bibliographic Network The IMDB Movie Network Actor Movie Director MovieStudio The Facebook Network Han, Wang, & El-Kishky (KDD 2014)

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Topic Modeling: Static (STM) vs. Dynamic (DTM)  Generative Model Concepts  Using Time in DTM Heterogeneous Information Network Analysis (HINA) From Link Analysis to Link-Augmented DTM  Community Detection: Gibson et al., Lu & Getoor  From RankClus to NetClus: Sun et al.  LIMTopic (Duan et al.) & Other Approaches Outline

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Link Analysis [1]: Instance-based Dependencies A1A1 P3P3 I1I1 Papers that cite P 3 are likely to be P3P3 Getoor, Bhattacharya, Lu, & Sen (MRDM 2004)

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Link Analysis [2]: Class-based Dependencies A1A1 ? I1I1 Papers that cite are likely to be ? Getoor, Bhattacharya, Lu, & Sen (MRDM 2004)

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Topic Modeling: Static (STM) vs. Dynamic (DTM)  Generative Model Concepts  Using Time in DTM Heterogeneous Information Network Analysis (HINA) From Link Analysis to Link-Augmented DTM  Community Detection: Gibson et al., Lu & Getoor  From RankClus to NetClus: Sun et al.  LIMTopic (Duan et al.) & Other Approaches Outline

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Link Descriptions X Categories In-Links(X) Out-Links(X) mode: CO(X) binary: (1,1,1) binary: (1,1,0) count: (1,2,0) count: (2,1,0) mode: CI(X) binary: (1,0,0) count: (2,0,0) mode: count: (3,1,1) Getoor, Bhattacharya, Lu, & Sen (MRDM 2004)

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases 24 Predictive Model for Classification A structured logistic regression  Compute P(c | OA(X)) and P(c | LD(X)) separately using separate logistic regression models  where OA(X) are the object attributes and LD f (X) are the link features Getoor, Bhattacharya, Lu, & Sen (MRDM 2004)

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases 25 Prediction category set { } P5P5 P4P4 P3P3 P2P2 P1P1 P5P5 P4P4 P3P3 P2P2 P1P1 Step 1: Bootstrap using object attributes only Getoor, Bhattacharya, Lu, & Sen (MRDM 2004)

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Topic Modeling: Static (STM) vs. Dynamic (DTM)  Generative Model Concepts  Using Time in DTM Heterogeneous Information Network Analysis (HINA) From Link Analysis to Link-Augmented DTM  Community Detection: Gibson et al., Lu & Getoor  From RankClus to NetClus: Sun et al.  LIMTopic (Duan et al.) & Other Approaches Outline

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases NetClus [1] Gupta, Aggarwal, Han, Sun (ASONAM 2011) Algorithm to perform clustering over heterogeneous network Performs iterative ranking of clustering of objects. Probabilistic generative model is used to model probability of generating different objects from each cluster Maximum likelihood technique used to evaluate posterior probability of presence of object in cluster

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases NetClus [2] Gupta, Aggarwal, Han, Sun (ASONAM 2011)

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Using NetClus Gupta, Aggarwal, Han, Sun (ASONAM 2011)

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases NetClus Results: Net Cluster Tree Gupta, Aggarwal, Han, Sun (ASONAM 2011) Level 1 Level 2 Level 3 K=3 nc

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases Topic Modeling: Static (STM) vs. Dynamic (DTM)  Generative Model Concepts  Using Time in DTM Heterogeneous Information Network Analysis (HINA) From Link Analysis to Link-Augmented DTM  Community Detection: Gibson et al., Lu & Getoor  From RankClus to NetClus: Sun et al.  LIMTopic (Duan et al.) & Other Approaches Outline

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases LIMTopic [1]: Document Networks Duan, Li, Li, Zhang, Gu, & Wen (IEEE TKDE 2011)

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases LIMTopic [2]: PageRank-Like Ranking Algorithms Duan, Li, Li, Zhang, Gu, & Wen (IEEE TKDE 2011)

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases LIMTopic [3]: Link-Augmented STM Duan, Li, Li, Zhang, Gu, & Wen (IEEE TKDE 2011)

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases LIMTopic [4]: Using Link Importance Duan, Li, Li, Zhang, Gu, & Wen (IEEE TKDE 2011) Link importance

Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for Knowledge Discovery in Databases LIMTopic [5]: Evaluating Topic Models for Ranking Duan, Li, Li, Zhang, Gu, & Wen (IEEE TKDE 2011)