Presentation on theme: "ESPON 2013 DATABASE Malmö Seminar, 2-3 December 2009 Thematic structuring of the ESPON 2013 DB Geoffrey Caruso and Nuno Madeira."— Presentation transcript:
ESPON 2013 DATABASE Malmö Seminar, 2-3 December 2009 Thematic structuring of the ESPON 2013 DB Geoffrey Caruso and Nuno Madeira
Outline Towards an ESPON thesaurus? Text mining methods for organising knowledge Techniques to increase visual perception: first results Short-term solution
Towards an ESPON thesaurus? Draft technical report describes some of the main features for thesaurus construction Presents some examples developed by international organisations (ILO, UNESCO, FAO, EUROSTAT, …) Stresses the importance of harmonising vocabulary Explores the usefulness of text mining methods to further support the thematic structuring
Text mining methods for organising knowledge Textual data is usually considered as a collection of unstructured information that needs to be prepared in a very special way before any method can be applied Text mining methods transform data from text to standard numerical forms For this purpose we have collected approximately 200 reports, studies, and policy notes addressing ESPON evidence and results. The dependency and ambiguity of textual data required a primary focus on data preparation
Techniques to increase visual perception Explore visualisation tools through maps of keywords based on co-occurrence data to better communicate outputs First results reveal highly complex structures, though some interpretation can be discerned However, it questions the completeness of our corpus for analysis, especially in terms of cluster stability For instance, how many reports and studies are sufficient to guarantee consistent results?
Short-term solution First hierarchical structure not deriving from text mining methods but rather adapting the previous ESPON DB based on indicators delivered so far Investigate the degree of resemblance between some important database classifications (EUROSTAT, OECD, EEA, UNEP, WPI) and ESPON 2006 DB Identify patterns that could contribute to the harmonisation of categories or themes Employ matrix visualisation techniques for cluster analysis Knowledge acquired from text mining methods will constitute the basis for improvement on both hierarchical and associative relationships ESPON 2013 Database Population Natural population change Life expectancy at birth Transport Potential accessibility by air Potential accessibility by road Environment Landscape fragmentation Environmental quality Agriculture ? ?