Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lexicography and computer science: a harmless drudgery? Judith Knapp Andrea Abel European Academy Bozen - Bolzano.

Similar presentations


Presentation on theme: "Lexicography and computer science: a harmless drudgery? Judith Knapp Andrea Abel European Academy Bozen - Bolzano."— Presentation transcript:

1 Lexicography and computer science: a harmless drudgery? Judith Knapp Andrea Abel European Academy Bozen - Bolzano

2 Content Learners Difficulties and Needs Pedagogical Lexicography Today – A Short Overview ELDIT – Linguistic-lexicographic Background & Live Demo Datamodel Implementation Content Authoring ELDIT and Word Manager ELDIT and the TreeTagger Literature Conclusion

3 Learners difficulties and needs Problems with foreign language use DecodingEncoding Problems Syntagmatic level Paradigmatic level Semantic level

4 PROBLEMS WITH SYNONYMS AND SIMILAR WORDS (meeting) convegno riunione incontro assemblea Assemblea condominiale (condominium meeting) assemblea daffari (business meeting)

5 DIFFICULTIES WITH WORD COMBINATIONS Collocations fixed combinations of words (arbitrary, unpredictable): Ex: to brush ones teeth lavarsi i denti sich die Zähne putzen Grammatical Constructions formed according to the rules of grammar, partly arbitrary: Ex: to ask sb sth chiedere qlco a qlcu jemanden etwas fragen

6 Paradigmatic level Learners difficulties and needs Problems with foreign language use DecodingEncoding Problems Syntagmatic level Semantic level Metalanguage Problems with dictionary use Problems with dictionary use Problems with dictionary use Abbreviations Technical terms Other codes Descriptive language

7 Italian agg. art. tr. determ. pron. femm. ant. volg. region. mus. sociol. ABBREVIATIONS German Adj. Art. tr. best. Pron. w./Fem. veralt. vulg. landsch. Mus. Soziol.. (adjective) (article) (transitive verb) (definite article) (pronoun) (feminine) (archaic) (vulgar) (regional) (music) (sociology)

8 aggettivo articolo ausiliare transitivo determinativo pronome femminile antico volgare dialetto musica sociologia TECHNICAL TERMS Adjektiv Artikel Hilfsverb transitiv bestimmt Pronomen weiblich veraltet vulgär landschaftlich Musik Soziologie grammar language variation

9 OTHER CODES International Phonetic Alphabet (IPA) or other transcription systems focus shake chiesa [chiè-sa] Syntactic information (valency) provided in coded or abbreviated form Ex.: (a) geben; [...] Vt j-m etw. g (Langenscheidt) (b) give 2 Vnn (Cobuild) Vn (c) dare 17. N-V-N1 (N2/a N3) (Blumenthal/ Rovere).

10 UNDERSTANDING THE DEFINITION... Ich muß im Lexikon nachschlagen, um herauszufinden, was eine Jungfrau ist. [...] Im Lexikon steht, Jungfrau, Frau (gewöhnlich jung), welche sich in einem Zustand unangetasteter Keuschheit befindet und in diesem verbleibt. Jetzt muß ich unangetastet und Keuschheit nachschlagen, und alles, was ich hier finde, ist, daß unangetastet das Gegenteil von angetastet bedeutet, und Keuschheit bedeutet keusch, und das bedeutet frei von gesetzeswidrigem geschlechtlichen Interkursus. Jetzt muß ich Interkursus nachschlagen [...] und ich weiß nicht, was das bedeutet, und ich bin es einfach leid, in dem schweren Lexikon von einem Wort zum anderen geschickt zu werden wie ein Vollidiot, und das alles nur, weil die Leute, die das Lexikon geschrieben haben, nicht wollen, daß unsereins etwas erfährt. Ich will doch nur wissen, wo ich hergekommen bin, aber wenn man jemanden fragt, sagen sie einem, man soll jemand anderen fragen, oder sie schicken einen von Wort zu Wort. (McCourt 1998: 412 – 413, dt. Übersetzung)

11 Paradigmatic level Learners difficulties and needs Problems with foreign language use DecodingEncoding Problems with dictionary use Problems Syntagmatic level Problems with dictionary use Problems with dictionary use Semantic level Metalanguage Abbreviations Technical terms Other codes Descriptive language Formal Problems SearchPresentation

12 Problems with searching Time consuming pages - Small characters - Difficult metalanguage Complex expressions - Collocations (Zähne putzen) - Idiomatic expressions …

13 Problems with presentation Limited space Linear presentation order Organisation of the dictionary Organisation of the entries

14

15

16 Paradigmatic level Learners difficulties and needs Problems with foreign language use DecodingEncoding Problems with dictionary use Metalanguage Problems Syntagmatic level Problems with dictionary use Problems with dictionary use Semantic level Abbreviations Technical terms Other codes Descriptive language Formal Problems SearchPresentation Solutions

17 Pedagogical Dictionaries Target Group: language learners Functions: encoding & decoding General characteristics: -(usually) monolingual - selective regarding macrostructure (limited number of entries ) exhaustive regarding microstructure (detailled information for each entry)

18 ELDIT Elektronisches Lern(er)wörterbuch Deutsch-Italienisch Dizionario elettronico per apprendenti Italiano-Tedesco Dizionario elettronico per apprendenti Italiano-Tedesco

19

20 1. typologically innovative: a monolingual dictionary (German or Italian): definitions, collocations, idiomatic expressions, examples … in the target language & a bilingual dictionary (German and Italian): translation equivalents, explanations in L1 cross-lingual dictionary German-Italian Three main characteristics:

21 2. well defined target group: beginners – intermediate students (Waystage level A1 up to Threshold level B1): basic vocabulary: ~ entry words for each language addressed to the linguistic layman: limited use of meta-language, abbrevations and symbols

22 3. designed solely for computer use: not a transformation of a paper dictionary into a electronic dictionary exploits the possibilities of the electronic medium (multimedia & hypertext) modular structure: contains detailled informations which you usually find in different types of dictionaries

23 Paradigmatic level Learners difficulties and needs Solutions Problems with foreign language use DecodingEncoding Problems with dictionary use Metalanguage Problems Syntagmatic level Descriptive language Other codes Technical terms Abbrevations PresentationSearch Problems with dictionary use Problems with dictionary use Semantic level Formal Problems 1) Simple 2) Use of L1 3) Multimedia 1) Definitions 2) Examples 3)... Electronic search possibilities Hypertext and hyperlinks 1) Sound-files 2) Verb patterns 1) Avoiding 2) Explaining 1) Avoiding 2) Explaining

24 1. Simple 2. Multiple descriptions 3. Hypertext SOLUTIONS... Descriptive language

25 a) Limited defining vocabulary b) Easy syntax d) Avoid circularity 1. Simple =

26 a) Definitions b) Lexicographic examples c) Word fields d) L1 (semantic equivalents) [e) images] 2. Multiple descriptions =

27 Semantic Level: Semantic information: 1. Definitions 2. Examples 3. Word fields 4. Equivalents Hypernyms Coordinates Kinds of... das Gebäude das Haus das Haus, die Villa, das Schloss, die Wohnung... das Hochhaus, das Bauernhaus... 1.a) Ein Haus ist ein Gebäude, in dem Menschen wohnen. casa Sie wohnt mit ihrer Familie in einem zweistöckigen Haus am Stadtrand. b) Ein Haus ist das Gebäude, in dem man ständig lebt und in das man regelmäßig zurückkehrt. Es ist der Ort, wo man daheim ist. Sie verlässt das Haus jeden Morgen um sieben Uhr, um zur Arbeit zu fahren. 2. Das Haus sind die Bewohner eines Hauses (1a).casa....

28 a) Click on unknown words inside the definition b) Click on the semantic equivalents c) Click on any information youre interested in 3. Hypertext =

29 Paradigmatic level Learners difficulties and needs 1) Simple 2) Use of L1 3) Multimedia Solutions Problems with foreign language use DecodingEncoding Problems with dictionary use Metalanguage Problems Syntagmatic level Descriptive language Other codes Technical terms Abbrevations PresentationSearch Problems with dictionary use Problems with dictionary use Semantic level 1) Definitions 2) Examples 3)... 1) Collocations 2) Examples 3)... Hypertext and hyperlinks Electronic search possibilities Formal Problems 1) Sound-files 2) Verb patterns 1) Avoiding 2) Explaining 1) Avoiding 2) Explaining

30 1. Collocations 2. Idiomatic Expressions 3. Verb Valency Syntagmatic level:

31 - Definition: Valency refers to the capacity of a verb to take a specific number and type of arguments (Bianco) - Theoric origin: dependency grammar (Lucien Tesnière) Verb Valency

32 verb constructions are largely arbitrary and unpredictable number of obligatory and facultative elements distinction between transitivity and intransitivity … Verb Valency: a problem for learners and researchers

33 General monolingual dictionaries The description of verb valency in different dictionary types fragen: [jemdn.] unvermittelt,... etw. fragen Duden Deutsches Universalwörterbuch chiedere: v.tr. (2 argom.)Disc chiedere: v.tr.Devoto/Oli

34 2. Special mono- and bilingual verb valency dictionaries The description of verb valency in different dictionary types fragen: 01a v 1b CBianco chiedere: N- V- N1 (N2/a N3) Blumenthal/Rovere

35 3. (Monolingual) learners dictionaries The description of verb valency in different dictionary types fragen: Vt/i (j-n) (etw.) f.Langenscheidt fragen: tr K jd fragt jdn [nach etw dat] Pons Basiswörterbuch chiedere: tr.Dib

36 Description of Verb Valency in ELDIT Explicit way of describing verb valency N-V-N1-(N2)v.tr. (2 argom.) Vt/i (etw.) (über j-n/etw.) r. I. Learner friendly description:

37 Description of Verb Valency in ELDIT II. Multimedia: Visualization of information to support comprehension (colors and animations instead of meta-language)

38 Description of Verb Valency in ELDIT III. Semiotic didactics: Functions of the different colors: -they indicate the parts of the sentence -they show which parts of the verbs belong together -correspondence between patterns and examples

39 Description of Verb Valency in ELDIT IV. Additional explanations for the learner: -Visible notes to describe semantic restrictions -Variations for realizing single parts of the sentence

40 Paradigmatic level Learners difficulties and needs Hypertext and hyperlinks 1) Simple 2) Use of L1 3) Multimedia 1) Collocations 2) Examples 3)... 1) Definitions 2) Examples 3)... Solutions Problems with foreign language use DecodingEncoding Problems with dictionary use Formal Problems Metalanguage Problems Lexical fields Three dimensional graphics Syntagmatic level Descriptive language Other codes Technical terms Abbreviations PresentationSearch Problems with dictionary use Problems with dictionary use Semantic level Electronic search possibilities 1) Sound-files 2) Verb patterns 1) Avoiding 2) Explaining 1) Avoiding 2) Explaining

41 Word field theory: Ein Wortfeld ist eine Gruppe von Wörtern, die inhaltlich einander eng benachbart sind und die sich vermöge Interdependenz ihre Leistungen gegenseitig zuweisen. (Trier 1968/1973: 189, späte Def.) Existing Projects - WordNet (GermaNet, Italian WordNet) - Alexia - Kirrkirr PARADIGMATIC RELATIONS

42 Paradigmatic relations in ELDIT Ca. 150 words per language interactive graphic representation spacial arrangement and colors for the representation of paradigmatic lexical relations explicit description of the semantic relations between the lexical units and the lemma (no metalanguage) definitions and examples for describing similarities/differences of meaning, register, authentic context

43 Lexical fields in ELDIT Type of meaning relations: hierachical relations (hyperonymy/hyponymy; holonymy/meronymy) non-hierarchical relations (similarity: synonyms, quasi-synonyms … - contrast: gradable and nongradable antonyms; converse terms)

44

45

46

47 Paradigmatic level Learners difficulties and needs Hypertext and hyperlinks 1) Simple 2) Use of L1 3) Multimedia Three dimensional graphics 1) Collocations 2) Examples 3)... 1) Sound-files 2) Verb patterns 1) Avoiding 2) Explaining 1) Avoiding 2) Explaining 1) Definitions 2) Examples 3)... Solutions Problems with foreign language use DecodingEncoding Problems with dictionary use Formal Problems Metalanguage Problems Syntagmatic level Descriptive language Other codes Technical terms Abbreviations PresentationSearch Problems with dictionary use Problems with dictionary use Semantic level Electronic search possibilities

48 Other modules Flexion Word family N.B.

49 Datamodel Needs for an innovative presentation

50 A detailed data model

51

52 Implementation –Hierarchical structured data –Many changes were expected –Communication with linguists

53 Use of XML –XML und XML-Editor Hierarchic Structure Communication with Linguists –Java-Servlet Technology –DXML or JDOM –Dynamic Generation of HTML

54 Content Authoring –Difficult –Time consuming –Error prone In ELDIT: –Innovative Presentation –Efficient Interface (Real World System) –Research of Linguists

55 Efficient Authoring Interface

56 Semi-structured Data Automatic full-structuring Automatic enriching Efficient Authoring Interface

57 Semi-structured Data

58

59 Automatic full-structuring Meine Eltern haben das Haus vor 50 Jahren gebaut. die Be haus ung

60 Automatic Enriching By using Computational Linguistics tools WordManager TreeTagger PhraseManager, WordNet, Parser, …

61 die Be_haus_ung la dimora

62 die Be haus ung la dimora

63 ELDIT and WordManager WordManager WM Transducers WordManager in ELDIT

64 WordManager –System for reusable morphological dictionaries –Information of a word about Flexion (Declination and Conjugation) Word formation (Derivation and Composition) Orthography (Old and new for German) … –German, Italian, English

65 LemmatizerHäusern haus (Cat N) Inflection AnalyzerHäusern haus (Cat N)(Gender N)(Num PL)(Case Dat) Inflection GeneratorHaus haus (Cat N)(Gender N)(Num SG)(Case Nom), haus (Cat N)(Gender N)(Num SG)(Case Gen), häuser (Cat N)(Gender N)(Num PL)(Case Nom), häusern (Cat N)(Gender N)(Num PL)(Case Dat) … Word Formation Analyzer kennenlernen kennen (Cat V)(Aux haben) lernen (Cat V)(Aux haben) Word Formation Generator bosco abbracciabosco (Cat N)(Gen M) boscaglia (Cat N)(Gen F) boscaiolo (Cat N)(Gen M) … WM Transducers

66 WM in ELDIT Search (Lemmatizer)

67 Links and Additional Examples (Lemmatizer)

68 Exercises (Analyzer)

69 Conjugation tables (Generator)

70 ELDIT and TreeTagger ELDIT Text Corpus Development Tagging Manual Corrections

71 ELDIT Texts

72 Development MSWord (Goethe Institut of Milan) HTML Simple XML

73 Tagging POS – tagging ( TreeTagger) XML with links Iterative Correction by frequency of unlinked words

74 Corrections Old German spelling rules valid until 1998 The Italian verb sono (they are) was always tagged with sonare (=suonare, make music) instead of with essere (to be). The verb sia (he may be) was always recognized as a conjunction and tagged with sia (as well as) instead of with essere (to be). Many conjugated forms of avere were tagged with riavere (to get something back) instead of with avere (to have). Many conjugated forms of andare were tagged with riandare (to go back) instead of with andare. Abbreviated forms of Italian words (such as bel, vuol, pur, fin) were tagged as nouns and with the original form as lemma. Some Italian words which exist both as nouns and as past participles (such as the word successo (the success, it happened)) were tagged with the wrong word class.

75 Literature ators/JKnapp/index.htm Publications (some linguistic ones, too) PhD-Thesis (Abel Andrea – Uni Innsbruck; Judith Knapp – Uni Hannover)

76 Conclusion syntagmatisch, paradigmatisch, pragmatisch, Polysemie, Homographie, Homonymie, Holonymie, Hyponymie, Hyperonymie, semiotisch, ludativ, … Goal based scenarios, blended learning … TEI, CES, NLP, Lemmatizing, POS- Tagging … Fileserver, Webserver, Datenmodell, HTTP request, Client, Protokoll, Port, … + uv -


Download ppt "Lexicography and computer science: a harmless drudgery? Judith Knapp Andrea Abel European Academy Bozen - Bolzano."

Similar presentations


Ads by Google