Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Slides:



Advertisements
Similar presentations
The Future of Scholarship in the Digital Age: The Role of Institutional Repositories Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
Advertisements

Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
Ailab.ijs.si Jasna Škrbec Blaž Fortuna Marko Grobelnik Exploring & Visualization of News Archives.
1 Multi-way Algorithm for Cube Computation CPS Notes 8.
A CMS for PhD Theses Oleg Burlaca, Constantin Gaindric, Svetlana Cojocaru Institute of Mathematics and Computer Science Oleg Burlaca, Constantin Gaindric,
Understanding Cancer-based Networks in Twitter using Social Network Analysis Dhiraj Murthy Daniela Oliveira Alexander Gross Social Network Innovation Lab.
Dimensionality Reduction PCA -- SVD
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
A shot at Netflix Challenge Hybrid Recommendation System Priyank Chodisetti.
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
Link Structure and Web Mining Shuying Wang
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Canonical Correlation Analysis: An overview with application to learning methods By David R. Hardoon, Sandor Szedmak, John Shawe-Taylor School of Electronics.
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Marko Grobelnik Jasna Škrbec Jozef Stefan Institute Social Context as a part of News-Archive-Explorer Web application for exploratory browsing of news.
METADATA Research Data Management. What is metadata? Metadata is additional information that is required to make sense of your files – it’s data about.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
1 Web Basics Section 1.1 Compare the Internet and the Web Compare Web sites and Web pages Identify Web browser components Describe types of Web sites Section.
Online Autonomous Citation Management for CiteSeer CSE598B Course Project By Huajing Li.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
Designing Educational Web Sites to Support Student Learning Steven WarburtonTELRI Project.
SBEC Master Technology Teacher Competency 5 Marie L. Evans EDTC 6342 Janice Butler.
Mendeley Citation Management and Research Network Helen Smith Life Sciences Library Penn State University.
SCOPUS AND SCIVAL EVALUATION AND PROMOTION OF UKRAINIAN RESEARCH RESULTS PIOTR GOŁKIEWICZ PRODUCT SALES MANAGER, CENTRAL AND EASTERN EUROPE LVIV, 11 SEPTEMBER.
FODAVA-Lead Education, Community Building, and Research: Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park.
Module 5 A system where in its parts perform a unified job of receiving inputs, processes the information and transforms the information into a new kind.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
CONCLUSION & FUTURE WORK Given a new user with an information gathering task consisting of document IDs and respective term vectors, this can be compared.
Project 1: Machine Learning Using Neural Networks Ver 1.1.
The Digital Library for Earth System Science: Contributing resources and collections Meeting with GLOBE 5/29/03 Holly Devaul.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Article by Dunja Mladenic, Marko Grobelnik, Blaz Fortuna, and Miha Grcar, Chapter 3 in Semantic Knowledge Management: Integrating Ontology Management,
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
SciVal Spotlight Training for KU Huiling Ng, SciVal Product Sales Manager (South East Asia) Cassandra Teo, Account Manager (South East Asia) June 2013.
Introduction to the Semantic Web and Linked Data
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Using to Save Lives Or, Using Digg to find interesting events. Presented by: Luis Zaman, Amir Khakpour, and John Felix.
Microsoft Innovative Teacher Awards READ THIS CAREFULLY The following slides provide you with guidelines for the content of your Innovative Teacher Awards.
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
Information Visualization, Human-Computer Interaction, and Cognitive Psychology: Domain Visualizations Kevin W. Boyack Sandia National Laboratories.
CS Architecture of Web Information Systems Spring 04 April 16 th 2004 Shay David sd256 at cornell.edu Social Networks in Scholarly publishing.
How to Create an Essential Metadata Record Using an Online Tool aka ‘ Now You Have No Excuse For Not Creating.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Computation on Graphs. Graphs and Sparse Matrices Sparse matrix is a representation of.
Documents Authors Objectives Software Description Learning Areas Levels.
Sul-Ah Ahn and Youngim Jung * Korea Institute of Science and Technology Information Daejeon, Republic of Korea { snowy; * Corresponding Author: acorn
CitEc as a source for research assessment and evaluation José Manuel Barrueco Universitat de València (SPAIN) May, й Международной научно-практической.
Demonstration: Tools for large scale bibliometric analysis André Somers | 1 June 25, 2009.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
System for Semi-automatic ontology construction
Poster Presentation Formatting Guidelines for Accepted Papers
Current Issues or Challenges in Visual Analytics
How to Improve the Visibility and Impact of Your Research
Gibraltar Financial Services Commission
15-826: Multimedia Databases and Data Mining
Multi-Dimensional Data Visualization
15-826: Multimedia Databases and Data Mining
Semi-Automatic Data-Driven Ontology Construction System
Analyzing and Organizing Information
Presentation transcript:

Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US

Part I: Challenge details

ePrints Database of around 1600 papers published by Pascal members Papers are described with:  Authors (unique Pascal Id)  Title  Abstract (most papers)  Publish date (some papers only have year)

Challenge Goal Two main goals: to test and compare different text visualization methods, ideas and algorithms on a common dataset, to contribute to the Pascal dissemination and promotion activities by using data about scientific publications from Pascal’s EPrints serverPascal’s EPrints server

Task Visualize and present the Pascal ePrints data in a novel way which enables:  discovering main areas covered by the papers and people in Pascal,  discovering area and people developments trough time,  helping the researchers with recommendation on which papers to read,  helping at finding the right reviewers for new papers.

Data Raw XML file from Pascal ePrints server Processed data for easier use:  Bag-of-words (TextGarden, Matlab)  Graph (Matlab, Pajek) Data processed for different possible scenarios.

Raw XML file Cleaned data from Pascal ePrints server. Data is given as a list of papers, each paper is described by:  Title  Abstract  Year of publication  List of authors Each Author is described by unique Pascal Id and institution. Synthesis of Maximum… In this presentation… Computati… Learning… Theory … Sandor Szedmak John Shawe-Taylor Universit…

Bag-of-words Covered scenarios: Document == Paper Document == Author Document == Institution Available formats: TextGarden  Text file where one line equals one document Matlab  Data available in form of sparse Term-Document matrix TextGarden ( ): Format: Document_name !Subject DocumentList Example: Support_Vector_Machine_to_synthesise_kernels !Machine_Vision !Theory_and_Algorithms Support Vector Machine to synthesise kernels -- Suppose we are given two sets of … Matlab: Sparse matrix saved in text file, it can be simply read into Matlab by: X = spconvert(load(‘papers.dat’)); Documents are columns in the matrix Names of columns (document names) and rows (words) are provided.

Graph Covered scenarios: Vertex == Word, Edge == Co-Appearance Vertex == Author, Edge == Co-Authors Vertex == Institution, Edge == Collaboration Available formats: Matlab  Data available in form of sparse adjacency matrix Pajek  Software for network analysis Matlab: Sparse matrix saved in text file, it can be simply read into Matlab by: X = spconvert(load(‘words.dat’)); Names of vertices (words, authors, institutions) are provided. Pajek: Can be downloaded from:  vlado.fmf.uni- lj.si/pub/networks/pajek

Submissions The results can be:  images,  movies,  Web sites,  VRML files,  executables (windows, linux),  etc. For interactive tool also provide a video, showing the use of the tool on the Pascal ePrints data.

Evaluation Usability of visualization – The goal is to assess usability of particular visualization in different practical contexts. Innovativeness – The goal is to estimate how innovative are the ideas used for visualization. Aesthetics of the image – Here we are aiming to identify the "nicest" images from the challenge. General Pascal-researchers’ voting over the web about "who likes what". Since all the criteria are subjective, we will hire experts for judging about the quality. Each of the criteria will generate a separate ranking.

Part II: Examples

Visualization example 1/2: Document Atlas Bag-of-words approach: Document == Author Author is described by a sum of all the abstracts from the papers he co-authored. We construct separate profile for papers from year 2004 and papers from year 2005.

Dimensionality reduction Documents are mapped from bag-of-words space to two dimensions in two steps:  Latent Semantic Indexing: dim => 110 dim  Multidimensional Scaling 110 dim => 2 dim The background reflects the density of documents document

Background words Each part of the map is assigned a keyword which is most representative for the documents in the area. We get a “map” of the topics covered within the documents. In the case of Pascal ePrints data areas on the map correspond to the areas covered within the Pascal Network.

Time dynamics For each author we have profile for years 2004 and 2005 By showing the difference we can see how authors’ research focus developed between 2004 and gradient

Co-Authorships

Live Demo

Visualization example 2/2: IST World Web portal developed within IST World EU project Uses search and visualization methods to:  discover the main research areas and collaborations within the PASCAL organizations  produce recommendation on which papers to read (e.g. papers on image recognition, or kernel trick)  find the right reviewers for a new paper (e.g a paper on "brain computer interface") and assess their competence

Research areas Institutions are placed on the map of research areas from Pascal Network Example shows which are the areas closely related to JSI

Collaborations Collaboration of institutions Collaboration of authors working on “text mining”

Paper Recommendation

Competence Search

Live Demo

Thank you!