Ontology-based information retrieval of scientific information Natalia V. Loukachevitch Laboratory of Information Resources Analysis Research Computing Center of Moscow State University (MGU NIVC)
Thematic Search of Scientific Information Knowledge-based (ontology-based) search Use of synonyms Automatic query expansion Automatic analysis of query results Help in interactive search
Bilingual Sociopolitical Thesaurus The thesaurus development is based on three methodologies: methods of construction of information-retrieval thesauri (information-retrieval context, analysis of terminology, terminology-based concepts, a small set of relation types) development of wordnets for various languages (word-based concepts, detailed sets of synonyms, description of ambiguous text expressions) ontology and formal ontology research (strictness of relations description, necessity of many-step inference) (33,000 concepts, 80,000 Russian terms, 85,000 English terms)
General Lexicon Specific Lexicon Специальная лексика Socio-Political Domain vs. General Lexicon and Specific Lexicons Intermediate Zone Information Security Aviation Ontology Cultural Heritage Ontology on Natural Sciences and Technology 30,000 concepts; 70,000 terms
Thematic Structure tax; taxation system; tax payer; finances; economy; tax legislation; VAT legislation; law; draft law; Taxation Code; deputy minister; Ministry of Finance; finances; reform; tax reform population budget, estimate; finances; economy; document government; state power; Minister of Finance State Duma; state power; state
Thematic representation of a text: Thematic Node i || + == Thematic Node j Thematic node in the text
University Information System RUSSIA ( ) - Database of Fulltext Documents (1,5 mln): Legal Acts, Newspaper articles, Scientific Reports - Database “Statistics of Russian Federation” (Socio-economic Statistics, Demographic Statistics, Agrarian Statistics, Urban Statistics) - Database “Budget system of Russian Federation”) (
Visualisation of Data in Dynamic Tables and Maps
ConvertorsProcessingInterfacesServices Unified Technology Platform (Constructor)
Cross-Language Information Retrieval
Applications of technology Concept-based information retrieval (monolingual, bilingual) Information-Retrieval systems combining word-based and concept-based serach Concept-based automatic text categorization Automatic Question-Answering Automatic Text Summarization
Main Projects State Duma of RF ( …) Central Election Commission of RF ( …) Legal Company “Garant” (2002 – …) Ministry of Education ( ) Accounting Chamber of RF (2003 – …) Central Bank of RF (2006 – …) Grants: – McArthur Foundation (1994, 1995, …) – Ford Foundation (2002, 2003) – Russian Foundation for Basic Research (9) – Russian Foundation for Humanitarian (5) – Eurasia Foundation (2002, 2003)
Participance in International Forums Participance in Text REtrieval Conference TREC organized by NIST DARPA (TREC-6, TREC-8) Participance in Summarizarion Conference SUMMAC organized by NIST DARPA (1 st place) Cross-Language Evaluation Forum CLEF (DELOS program) –paricipance in Steering Committee –provision of Russian collections for evaluation purposes –information retrieval of domain-specific information retrieval Organizers of Russian Information Retrieval Evaluation Seminar ROMIP ( )