Applying Text Classification in Conference Management: Some Lessons Learned Andreas Pesenhofer, Helmut Berger, Michael Dittenbach, Andreas Rauber
Overview Conference Management Systems Classification & Clustering Case Studies ECDL 2005 ECR Conclusions
Conference Management Systems Set of tools to support conference workflow Basic support for paper submission & review collection Many tasks for further automation Selection of the program committee Topic assignment of submission Paper to reviewer assignment Support in review generation Poster arrangement Post-conference access to papers
Classification & Clustering Topic assignment of submission Problem: authors uncertain about precise topic assignment (conference terminology) Solution: support by automatic assignment Method: ATC based on abstracts Poster arrangement & Post-conference access to papers Problem: topic based arrangement Solution: clustering Method: SOM & Mnemonic SOM
ATC for topic assignment Train model based on previous conferences Abstract submission Automatic assignment Confirmation
Clustering for organization Arrange posters thematically Non-rectangular SOMs reflecting conference site Mnemonic SOMs simplify post-conference paper access
Overview Conference Management Systems Classification & Clustering Case Studies ECDL 2005 ECR Conclusions
ECDL 2005 – ATC data English abstracts of previous ECDL conferences Topics of the conference call -> defined seven categories Pre-processing (removing all numbers, punctuation marks, special characters, transformation to lower case) tfidf-weighting 4,141 unique terms IG of 3,460 top ranked terms average - accuracy over all category is 58.60%
ECDL – training data class-IDclass descriptionsum 1 Concepts of Digital Libraries, Concepts of Documents and Metadata 34 2 System Architectures, Open Archives, Collection Building, Integration and Interoperability 40 3 Information Retrieval, Information Organization, Search and Usage 67 4 User Studies, System Evaluation, Personalization, User Interfaces and User Centered Design 50 5Digital Preservation, Web Archiving and Long Term Access12 6Digital Library Applications and Case Studies65 7 Multimedia, Mixed Media, Audio, Video, 3D and non-traditional Objects 43 sum over the selected abstracts311
ECDL 2005 – classification results class-ID totalrecallF1F precision
ECDL 2005 – SOM data Poster and Paper Organization: full text of accepted posters of ECDL 2005 term selection based on minimal word length and document frequencies 30 posters terms Post-conference access 71 papers and posters – 5,654 terms
ECDL 2005 – SOM
ECDL 2005 – SOM (2)
Overview Conference Management Systems Classification & Clustering Case Studies ECDL 2005 ECR Conclusions
ECR - Data Abstracts of the ECR: European Congress for Radiology Training set: ECR 2003 & ,952 documents Test set: ECR documents Same steps as for the ECDL data Resulting in 14,887 unique terms IG: 5,720 top ranked terms, average accuracy over all categories of 73.57%
ECR – training data class-IDclass description sum 1Abdominal and Gastrointestinal Breast Cardiac Chest Computer Applications Contrast Media Genitourinary Head and Neck Interventional Radiology Musculoskeletal Neuro Pediatric Physics in Radiology Radiographers Vascular sum over the selected abstracts
ECR 2005 – classification results class-ID totalrecallF1F precision
Conclusions Quality is proportional to amount of training documents Structure of the classes (overlapping?) The bulk of submissions can be dealt with automatically May be used for session assignment Arrange poster & papers thematically Easy to memorize & find
Questions? E-Commerce Competence Center Donau-City-Strasse Vienna Austria Phone:+43/1/ Fax: +43/1/ Internet:
ECDL 2005