Presentation is loading. Please wait.

Presentation is loading. Please wait.

21. - 23. 2. 2007 VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž 1, Ján Paralič 2, Peter Smatana 2, Karol.

Similar presentations


Presentation on theme: "21. - 23. 2. 2007 VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž 1, Ján Paralič 2, Peter Smatana 2, Karol."— Presentation transcript:

1 21. - 23. 2. 2007 VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž 1, Ján Paralič 2, Peter Smatana 2, Karol Furdík 2 1: Brno University of Technology, FIT, Božetěchova 2, 612 66 Brno, University of Economics, Prague, W.Churchill Sq.4, 130 67 Praha, Czech Republic, smrz@fit.vutbr.cz 2: Technical University of Košice, Centre for Information Technologies, Letná 9, 040 01 Košice, Slovakia {Jan.Paralic, Peter.Smatana, Karol.Furdik}@tuke.sk

2 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 2 Contents KP-Lab project Trialogical Learning and Activity Theory Semantic Web Knowledge Middleware Text Mining Services Pre-processing Learning Ontologies Classification Future work

3 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 3 Full title: Knowledge Practices Laboratory www.kp-lab.org Integrated EU funded FP6 IST project No. 27490 Starting date: February 1st, 2006 Duration: 5 years 22 partners from 14 countries Main goal: creating a learning system aimed at facilitating innovative practices of sharing, creating and working with knowledge in education and workplaces. KP-Lab Project

4 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 4 Trialogical Learning Challenge - to capture innovative practices of both learning and working with knowledge, so-called knowledge practices. Trialogical Learning focuses on the social processes by which learners collectively enrich/transform their individual and shared cognition. Activity theory: the object-orientedness of human activity, mediation through cultural- historically developed tools of intelligent activity, contradictions emerging between the elements of activity systems.

5 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 5 Knowledge Artefacts KA - a central notion of Trialogical Learning Mediators of all activities and tasks among learners; Capture and preserve the shared knowledge within a community. Forms: Physical resources / tools (documents, SW code,...); Concept maps, taxonomies, ontologies, domain models; Plans, scientific theories, languages. Goal of KP-Lab project: to provide a platform (tools & methodology) for creation and transformation of KA‘s in the trialogical manner.

6 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 6 Scientific Challenges 1. Facilitating knowledge-creating learning beyond knowledge acquisition and social participation 2. Expanding and elaborating the "trialogical" object of educational activity 3. Eliciting the development of trialogical agencies 4. Facilitating horizontal and vertical boundary crossing 5. Developing tools for deliberate transformation of knowledge practices 6. Specifying design-principles of trialogical technologies 7. Developing methods regarding research on longitudinal transformation of knowledge practices 8. Creating an open, developing community of trialogical technologies

7 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 7 Semantic Web Knowledge Middleware SWKM goal - to facilitate knowledge creation processes by supporting advanced interactions of collaborating learners with knowledge artefacts, i.e. discovery, access, evolution, recommendation, and mining. Generic modules: Knowledge Repository - scalable persistent services for large volumes of knowledge artefacts' descriptions and ontologies; Knowledge Mediator - services for handling the main registry, discovery, and evolution for KP-Lab knowledge artefacts; Knowledge Matchmaker - services supporting interactions of KP- Lab users with knowledge artefacts employing their semantic descriptions.

8 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 8 SWKM Architecture Features: adopts SOA principles; built upon the RDFSuite OS platform; data: RDF, accessed by RQL / RUL.

9 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 9 Text Mining in the KP-Lab Text mining services - intelligent access and manipulation with the knowledge artefacts; to assist users in creating or updating the semantic descriptions of KP-Lab knowledge artefacts. TMS fundamental tasks: Ontology learning - extraction of conceptual maps (clustering), i.e. an automatic extraction of significant terms from KA's textual descriptions and converting them to a structure of concepts and their relationships. Classification of knowledge artefacts - grouping a given set of artefacts into predefined or ad hoc categories.

10 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 10 Schema of Text Mining Services

11 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 11 Pre-processing Preprocessing phase - transforming data into the appropriate form. It consists of several language-dependent NLP steps that provide annotations of the plain-text resources. Unified modules: tokenization, stemming (or lemmatization, e.g. in CZ/SK), elimination of stop words, POS (part-of-speech) tagging. Individual modules: (crucial for some methods of ontology learning) chunking, WSD (word-sense disambiguation), full syntactic analysis. GATE (http://www.gate.ac.uk/) - a platform for NLP, provides: an architecture, or organisational structure, for NLP software; a framework, or class library, which implements the architecture; a development environment built on top of the framework.

12 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 12 Ontology Learning (1) 1. Conversion to a plain text format Structural info in source file is used as metainformation in next steps. 2. Processing by GATE Tokenization, sentence boundaries, POS tagging (Brill‘s tagger), named entity recognition, Charniak's syntactic analyser. 3. Significant terms (concepts) identification A background domain model, created from additional textual resources. 4. Semantic relations identification A set of pre-defined (or automatically identified) patterns and co- occurrence statistics are used

13 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 13 Ontology Learning (2) 5. Ontology merging The extracted structure is combined with the global domain ontology (stored in KP-Lab knowledge repository). The mechanism of the explicit uncertain knowledge representation is used in this step. 6. Visualisation Combination of the gained qualitative data and the relevance weights. The selection of the most suitable visualisation form depends on the needs of KP-Lab users; the simple view in a graphical form is the proposal.

14 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 14 Ontology Learning (3) 7. Export to other formats Standard OWL export routines are supported currently. The emerging BayesOWL and FuzzyOWL formats are under development. Creation of the training set - background model: 2-billion-word GigaCorpus for English; 600-million-word corpus for Czech; additional relevant documents provided by users. Data simulation - using Wikiversity & Wikipedia texts. Scenarios: 1. Collaborative acquiring of knowledge in a company 2. Description of a field of interest. Creation of an essay for a given topic(s) in an academic environment.

15 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 15 Classification Task is to automatically organize a set of knowledge artefacts into predefined or ad hoc categories - existing or new concepts of an ontology. Classification is supervised by a model, created from a training set of semantically annotated artefacts. The model contains a set of parameters (weights, rules, etc.) created in the process of training and used in the classification of unknown examples. Algorithms to be used: simple term matching, kNN, SVM, Winnow, Perceptron, Naive Bayes (multinomial and binomial), boosting, decision rules, and decision trees (various combinations of growing and pruning methods). Implementation platform: JBowl library

16 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 16 JBowl Library JBowl - Open Source library in Java, provides support for: intelligent information retrieval, summarization, and information extraction from textual documents; text mining, clustering, categorization, classification tasks. Main characteristics: extendable modular architecture; platform for pre-processing (incl. NLP methods) and indexing of large textual collections; functions for creation and evaluation of text mining models (for both supervised or non-supervised algorithms). Web: http://sourceforge.net/projects/jbowl/

17 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 17 JBowl Library - Architecture models data analysis TokenizationSentence chunkingNP chunkingPOS tagging StatisticsTF IDFTerm selection categorizationclusteringkeyword extraction/ summarization information extraction utils BLASMatrixesCollections documents Lucene indexThesaurusXML

18 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 18 JBowl Library - Usage JBowl provides: Text categorization method for the active learning, allowing to reduce the number of training examples. Heuristics that selects examples according to the confidence of the classifier prediction for the given example. This heuristic does not require a validation set and can be used effectively to select a small set of labeled examples. Integration of several classification methods, evaluation. Tools for NLP (incl. Slovak linguistic resources and tools). Scenario for use of classification service: Annotation of new or updated artefacts - system can suggest suitable concepts from one or more ontologies to be assigned as metadata or conceptual description to the artefact.

19 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 19 Solving multilinguality - find a minimal set of NLP resources that are satisfactory for the (basic) functionality of the text-mining services. Increasing efficiency: requirement of synchronous SOA system - e.g. by the use of the Extensible Messaging and Presence Protocol (XMPP) Classification: Selection of most appropriate algorithms in the context of the automatic annotation of the artefacts according to the semantics codified in several ontologies. (with limited availability of training data) Ontology learning: to concentrate on the better ways of ontology merging (incl. the need to combine extracted relations with the ones from existing domain ontologies). Implementation of the first prototype of the SWKM (M24), testing and evaluation. Future Work

20 21. - 23. 2. 2007, VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž, Ján Paralič, Peter Smatana, Karol Furdík # 20 Thank you ! Questions? http://www.kp-lab.org Further information:


Download ppt "21. - 23. 2. 2007 VŠB - Technická univerzita Ostrava Text Mining Services for Trialogical Learning Pavel Smrž 1, Ján Paralič 2, Peter Smatana 2, Karol."

Similar presentations


Ads by Google