Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US

Part I: Challenge details

ePrints Database of around 1600 papers published by Pascal members Papers are described with:  Authors (unique Pascal Id)  Title  Abstract (most papers)  Publish date (some papers only have year)

Challenge Goal Two main goals: to test and compare different text visualization methods, ideas and algorithms on a common dataset, to contribute to the Pascal dissemination and promotion activities by using data about scientific publications from Pascal’s EPrints serverPascal’s EPrints server

Task Visualize and present the Pascal ePrints data in a novel way which enables:  discovering main areas covered by the papers and people in Pascal,  discovering area and people developments trough time,  helping the researchers with recommendation on which papers to read,  helping at finding the right reviewers for new papers.

Data Raw XML file from Pascal ePrints server Processed data for easier use:  Bag-of-words (TextGarden, Matlab)  Graph (Matlab, Pajek) Data processed for different possible scenarios.

Raw XML file Cleaned data from Pascal ePrints server. Data is given as a list of papers, each paper is described by:  Title  Abstract  Year of publication  List of authors Each Author is described by unique Pascal Id and institution. Synthesis of Maximum… In this presentation… Computati… Learning… Theory … Sandor Szedmak John Shawe-Taylor Universit…

Bag-of-words Covered scenarios: Document == Paper Document == Author Document == Institution Available formats: TextGarden  Text file where one line equals one document Matlab  Data available in form of sparse Term-Document matrix TextGarden ( www.textmining.net ): Format: Document_name !Subject DocumentList Example: Support_Vector_Machine_to_synthesise_kernels !Machine_Vision !Theory_and_Algorithms Support Vector Machine to synthesise kernels -- Suppose we are given two sets of … Matlab: Sparse matrix saved in text file, it can be simply read into Matlab by: X = spconvert(load(‘papers.dat’)); Documents are columns in the matrix Names of columns (document names) and rows (words) are provided.

Graph Covered scenarios: Vertex == Word, Edge == Co-Appearance Vertex == Author, Edge == Co-Authors Vertex == Institution, Edge == Collaboration Available formats: Matlab  Data available in form of sparse adjacency matrix Pajek  Software for network analysis Matlab: Sparse matrix saved in text file, it can be simply read into Matlab by: X = spconvert(load(‘words.dat’)); Names of vertices (words, authors, institutions) are provided. Pajek: Can be downloaded from:  vlado.fmf.uni- lj.si/pub/networks/pajek

Submissions The results can be:  images,  movies,  Web sites,  VRML files,  executables (windows, linux),  etc. For interactive tool also provide a video, showing the use of the tool on the Pascal ePrints data.

Evaluation Usability of visualization – The goal is to assess usability of particular visualization in different practical contexts. Innovativeness – The goal is to estimate how innovative are the ideas used for visualization. Aesthetics of the image – Here we are aiming to identify the "nicest" images from the challenge. General Pascal-researchers’ voting over the web about "who likes what". Since all the criteria are subjective, we will hire experts for judging about the quality. Each of the criteria will generate a separate ranking.

Part II: Examples

Visualization example 1/2: Document Atlas Bag-of-words approach: Document == Author Author is described by a sum of all the abstracts from the papers he co-authored. We construct separate profile for papers from year 2004 and papers from year 2005.

Dimensionality reduction Documents are mapped from bag-of-words space to two dimensions in two steps:  Latent Semantic Indexing: 13.000 dim => 110 dim  Multidimensional Scaling 110 dim => 2 dim The background reflects the density of documents document

Background words Each part of the map is assigned a keyword which is most representative for the documents in the area. We get a “map” of the topics covered within the documents. In the case of Pascal ePrints data areas on the map correspond to the areas covered within the Pascal Network.

Time dynamics For each author we have profile for years 2004 and 2005 By showing the difference we can see how authors’ research focus developed between 2004 and 2005. gradient

Co-Authorships

Live Demo

Visualization example 2/2: IST World Web portal developed within IST World EU project Uses search and visualization methods to:  discover the main research areas and collaborations within the PASCAL organizations  produce recommendation on which papers to read (e.g. papers on image recognition, or kernel trick)  find the right reviewers for a new paper (e.g a paper on "brain computer interface") and assess their competence

Research areas Institutions are placed on the map of research areas from Pascal Network Example shows which are the areas closely related to JSI

Collaborations Collaboration of institutions Collaboration of authors working on “text mining”

Paper Recommendation

Competence Search

Live Demo

Thank you!

Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Similar presentations

Presentation on theme: "Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Similar presentations

Presentation on theme: "Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US."— Presentation transcript:

Similar presentations

About project

Feedback