Collective Collaborative Tagging System Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana.

Slides:



Advertisements
Similar presentations
Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos.
Advertisements

1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
High Performance Dimension Reduction and Visualization for Large High-dimensional Data Analysis Jong Youl Choi, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Xyleme A Dynamic Warehouse for XML Data of the Web.
Berenzweig - Music Recommendation1 Music Recommendation Systems: A Progress Report Adam Berenzweig April 19, 2002.
Towards Semantic Web Mining Bettina Berndt Andreas Hotho Gerd Stumme.
1 Draft of a Matchmaking Service Chuang liu. 2 Matchmaking Service Matchmaking Service is a service to help service providers to advertising their service.
Social Networking for Research Communities Using Tagging and Shared Bookmarks: a Web 2.0 Application Marlon Pierce, Geoffrey Fox, Joshua Rosen, Siddharth.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Web queries classification Nguyen Viet Bang WING group meeting June 9 th 2006.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Distributed Model-Based Learning PhD student: Zhang, Xiaofeng.
Overview of Web Data Mining and Applications Part I
Dimension Reduction and Visualization of Large High-Dimensional Data via Interpolation Seung-Hee Bae, Jong Youl Choi, Judy Qiu, and Geoffrey Fox School.
Tag-based Social Interest Discovery
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Social Networking to Support Researchers at Minority Serving Institutions Marlon Pierce Community Grids Lab Indiana University.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Generative Topographic Mapping in Life Science Jong Youl Choi School of Informatics and Computing Pervasive Technology Institute Indiana University
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.
INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.
Information Retrieval in Folksonomies Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.
No Title, yet Hyunwoo Kim SNU IDB Lab. September 11, 2008.
Personalized Interaction with Web Resources First Sino-German Symposium on KNOWLEDGE HANDLING: REPRESENTATION, MANAGEMENT AND PERSONALIZED APPLICATION.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Document retrieval Similarity –Vector space model –Multi dimension Search –Range query –KNN query Query processing example.
Generative Topographic Mapping by Deterministic Annealing Jong Youl Choi, Judy Qiu, Marlon Pierce, and Geoffrey Fox School of Informatics and Computing.
On Scaling Latent Semantic Indexing for Large Peer-to-Peer Systems Chunqiang Tang, Sandhya Dwarkadas, Zhichen Xu University of Rochester; Yahoo! Inc. ACM.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Document Clustering 文件分類 林頌堅 世新大學圖書資訊學系 Sung-Chien Lin Department of Library and Information Studies Shih-Hsin University.
Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.
June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
SINGULAR VALUE DECOMPOSITION (SVD)
Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm Seung-Hee Bae, Judy Qiu, and Geoffrey Fox SALSA group in Pervasive.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Web- and Multimedia-based Information Systems Lecture 2.
Progress Report (Concept Extraction) Presented by: Mohsen Kamyar.
SRG: A Digital Document-Enhanced Service Oriented Research Grid Ahmet E. Topcu Ahmet Fatih Mustacoglu Geoffrey C. Fox Aurel Cami Indiana University Computer.
MSI-CIEC Portal
V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.
Social Networking for Scientists (Research Communities) Using Tagging and Shared Bookmarks: a Web 2.0 Application Marlon Pierce, Geoffrey Fox, Joshua Rosen,
Knowledge based Question Answering System Anurag Gautam Harshit Maheshwari.
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.
Optimization Indiana University July Geoffrey Fox
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
3.3 Network-Centric Community Detection  Network-Centric Community Detection –consider the global topology of a network. –It aims to partition nodes of.
Event-Based Model for Reconciling Digital Entities Ahmet Fatih Mustacoglu Ahmet E. Topcu Aurel Cami Geoffrey C. Fox Indiana University Computer Science.
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Search can be Your Best Friend You just Need to Know How to Talk to it IW 306 Ágnes Molnár.
Personalized Social Image Recommendation
Overview Identify similarities present in biological sequences and present them in a comprehensible manner to the biologists Objective Capturing Similarity.
Dimension reduction : PCA and Clustering
Indiana University July Geoffrey Fox
Presentation transcript:

Collective Collaborative Tagging System Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University

People-Powered Knowledge Social web contents are increasing – Blogs, wikis, ratings, reviews, social tags, … – Help to utilize power of people’s knowledge Collaborative Tagging – Social bookmarks with metadata annotated – Connotea, Delicious, Flickr, MSI-CIEC Pros and cons – High-quality classifier by using human annotation – But lack of control or authority 1

Motivations 2

Proposed System Search Result Repository Query with various options RDF RSS Atom HTML Populate tags Distributed Collaborative Tagging Sites CCT System Data Coordinator User Service Data Importer 3 Collective Collaborative Tagging (CCT) System

Service Types and Algorithms Searching Given input tags, returning the most relevant X (X = URLs, tags, or users) LSI, FolkRank, Tag Graph I I Recommen dation With/without input tags, returning undiscovered X LSI, FolkRank, Tag Graph II Clustering Community discovering. Finding a group or a community with similar interests K-Means, DA clustering, Pairwise DA III Trend detection Analysis the tagging activities in time- series manner and detect abnormality Time Series Analsysis IV Service Description Algorithm Type 4

Two Models Vector-space model or bag-of-words model – q-dimension vector d i = (t 1, t 2, …, t q ) – Easy and convenient Graph model – Building a tag graph – Emphasis on relationship 5

Vector-space Vs. Graph 6 -. Distances, cosine, … -. O(N 2 ) complexity -. Distances, cosine, … -. O(N 2 ) complexity Dis- similarity Vector-space Model -. Paths, hops, connectivity, … -. O(N 3 ) complexity -. Paths, hops, connectivity, … -. O(N 3 ) complexity Graph Model -. Latent Semantic Indexing -. Dimension reduction schemes -. PCA -. Latent Semantic Indexing -. Dimension reduction schemes -. PCA Algorithm -. PageRank, FolkRank, … -. Pairwise clustering -. MDS -. PageRank, FolkRank, … -. Pairwise clustering -. MDS -. q-dimensional vector -. q-by-n matrix -. q-dimensional vector -. q-by-n matrix Represen- tation -. G(V, E) -. V = {URL, tags, users} -. G(V, E) -. V = {URL, tags, users}

Latent Semantic Indexing Dimension reduction from high q to low d(q À d) – Removing noisy terms – Extracting latent concepts – Using SVD to get optimized d-dim. matrix 7 Precision Recall

Pairwise Dissimilarity Pairwise distance (or dissimilarity) matrix – N-by-N matrix and each element represents a degree of distance or dissimilarity Used as an input – Multi-Dimensional Scaling (MDS) : to recover dimension information 8

Pairwise Dissimilarity Pairwise clustering – Input from vector-based model vs. graph model – How to avoid local minima/maxima? 9 Graph model Vector-based model

Deterministic Annealing Clustering 10 Deterministically avoid local minima – Tracing global solution by changing level of energy – Analogy to physical annealing process (High  Low)

More Machine Learning Algorithms Classification – To response more quickly to user’s requests – Training data based on user’s input and answering questions based on the training results – Artificial Neural Network, Support Vector Machine,… Trend Detection – Time-series analysis 11

Ongoing Development Different Data Sources Various IR algorithms Flexible Options Result Comparison 12

13