Příspěvek - PhD Workshop Vyší acceptance rate než v minulých letech (asi 46%) Dimensionality Reduction of Semantically Enriched Clickstreams Vynikající zpětná vazba – 5 obsáhlých recenzí Velmi široký záběr témat Postproceedings: Rozšířená verze konferenčního příspěvku publikována v ACM DL
Vybrané workshop příspěvky Improving the Accuracy of Entity Identification through Refinement – the goal of entity identification is to correctly identify all the instances of the same entity so as to eliminate the inconsistency of data sources during data integration. Full-text indexing and Information Retrieval in P2P Systems – Distributed IR Reasoning about Taxonomies and Articulations – This work formalizes taxonomies and relationships between them as formulas in logic. This formalization concretizes notions such as consistency and inconsistency of taxonomies and articulations (inter- taxonomic relations) between them, enables the derivation of new articulations based on a given set of taxonomies and articulations and provides a framework for testing assumptions about under-specified taxonomies.
Další čeští účastníci A Cost-based Join Selection for XML Twig Content-based Queries A Cost-based Join Selection for XML Twig Content-based Queries Radim Baca, Michal Kratky
Zaměření konference P2P XML Streaming Caching Query Processing Data Fusion
Industrial section Data Challenges at Yahoo! – Ricardo Baeza-Yates and Raghu Ramakrishnan Automatic Content Targeting on Mobile Phones Automatic Content Targeting on Mobile Phones
KDD 2008 14th ACM SIGKDD International Conference, Las Vegas
Příspěvek- MDM KDD Workshop Combining Image Captions and Visual Analysis for Image Concept Classification – Kliegr, Svátek, Nemrava, Chandramouli, Isquierdo Pro zajímavost, na stejném workshopu v minulosti publikoval Pavel Praks: Multimedia Data Mining Workshop (Pavel Praks05): Iris Recognition Using the SVD-Free Latent Semantic Indexing
Zajímavé příspěvky z workshopu Annotating images and image objects using a hierarchical Dirichlet process model – We apply this model for predicting labels of objects in images containing multiple objects. During training, the model has access to an un-segmented image and its caption, but not the labels for each object in the image. The trained model is used to predict the label for each region of interest in a segmented image. Mining the Web for Visual Concepts – Relevance feedback on Image + text data retrieved from the web
Zajímavé příspěvky z konference Building Semantic Kernels for Text Classification using Wikipedia – In this paper, we overcome the shortages of the BOW approach by embedding background knowledge derived from Wikipedia into a semantic kernel, which is then used to enrich the representation of documents.
Entity Categorization Over Large Document Collections – In this paper, we significantly improve the accuracy of entity categorization by (i) considering an entitys context across multiple documents containing it, and (ii) exploiting existinglarge lists of related entities (e.g., lists of actors, directors, books).
ArnetMiner: Extraction and Mining of Academic Social Networks Extracting researcher profiles automatically from the Web; 2) Integrating the publication data into the network from existing digital libraries; 3) Modeling the entire academic network; and 4) Providing search services for the academic network. So far, 448,470 researcher profiles have been extracted using a unified tagging approach. We integrate publications from online Web databases and propose a probabilistic framework to deal with the name ambiguity problem. Furthermore, we propose a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues. Search services such as expertise search and people association search have been provided based on the modeling results.
Heterogeneous Data Fusion for Alzheimers Disease Study In this paper, we propose to integrate heterogeneous data for AD prediction based on a kernel method. We further extend the kernel framework for selecting features (biomarkers) from heterogeneous data sources Experimental results show that the integration of multiple data sources leads to a coniderable improvement in the prediction accuracy.
Febrl – An Open Source Data Cleaning, Deduplication and Record Linkage System with a Graphical User Interface Freely Extensible Biomedical Record Linkage) It contains many re-cently developed techniques for data cleaning, deduplication and record linkage, and encapsulates them into a graphi-cal user interface (GUI). https://sourceforge.net/projects/febrl/
Using tagFlake for Condensing Navigable Tag Hierarchies from Tag Clouds Luigi Di Caro (University of Torino) K. Selçuk Candan (Arizona State University) Maria Luisa Sapino (University of Torino)
Pictor: An Interactive System for Importing Data from a Website demonstration of an interactive wrapper in- duction system, called Pictor, which is able to minimize labeling cost, yet extract data with high accuracy from a website. Our demonstration will introduce two proposed technologies: record- level wrappers and a wrapper-assisted labeling strategy. These approaches allow Pictor to exploit previously generated wrappers, in order to predict similar labels in a partially labeled webpage or a completely new webpage.
Trendy Text minig Advertising on web The 2nd International Workshop on Data Mining and Audience Intelligence for Advertising (ADKDD 2008) The 2nd International Workshop on Data Mining and Audience Intelligence for Advertising (ADKDD 2008) Medical datamining – Workshop on Mining Medical Data and KDD Cup 2008 Workshop on Mining Medical Data and KDD Cup 2008
Further highlights The 2nd SNA-KDD Workshop on Social Network Mining and Analysis (SNA-KDD 2008) The 2nd SNA-KDD Workshop on Social Network Mining and Analysis (SNA-KDD 2008) Workshop on Mining Medical Data and KDD Cup 2008 Workshop on Mining Medical Data and KDD Cup 2008 The 2nd International Workshop on Mining Multiple Information Sources The 2nd International Workshop on Mining Multiple Information Sources
ECML / PKDD 2008 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases Antwerpy
Příspěvek na WBBT Workshopu Wikis, Blogs and Bookmarking tools workshop Chair: Bettina Berendt Wikipedia As the Premiere Source for Targeted Hypernym Discovery Tomas Kliegr, Vojtech Svatek, Krishna Chandramouli, Jan Nemrava and Ebroul Izquierdo Wikipedia As the Premiere Source for Targeted Hypernym Discovery http://www.kde.cs.uni- kassel.de/ws/wbbtmine2008/pdf/all_wbbtmine2 008.pdf
Vybrané invited talks The Role of Hierarchies in Exploratory Data Mining – In a broad range of data mining tasks, the fundamental challenge is to efficiently explore a very large space of alternatives. The difficulty is two-fold: first, the size of the space raises computational challenges, and second, it can introduce data sparsity issues even in the presence of very large datasets. In this talk, well consider how the use of hierarchies (e.g., taxonomies, or the OLAP multidimensional model) can help mitigate the problem.
Learning Language from Its Perceptual Context Raymond J. Mooney The training data consists of textual human commentaries on Robocup simulation games. A set of possible alternative meanings for each comment is automatically constructed from game event traces. Our previously developed systems for learning to parse and generate natural language (KRISP and WASP) were augmented to learn from this data and then commentate novel games. The system is evaluated based on its ability to parse sentences into correct meanings and generate accurate descriptions of game events.
Watch, Listen & Learn: Co-training on Captioned Images and Videos leverage the text that often accompanies visual data to learn robust models of scenes and actions from partially labeled collections. Our approach uses co-training.
Co-training semi-supervised learning algorithm that requires two distinct views of the training data First learns a separate classifier for each view using any labeled examples The most confident predictions of each classifier on the unlabeled data are then used to iteratively construct additional labeled training data.
SSMS 2008 3rd Summer School on Multimedia Semantics, Chania, Crete