Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lexicography and computer science: a harmless drudgery?

Similar presentations


Presentation on theme: "Lexicography and computer science: a harmless drudgery?"— Presentation transcript:

1 Lexicography and computer science: a harmless drudgery?
Judith Knapp Andrea Abel European Academy Bozen - Bolzano

2 Content Learner‘s Difficulties and Needs
Pedagogical Lexicography Today – A Short Overview ELDIT – Linguistic-lexicographic Background & Live Demo Datamodel Implementation Content Authoring ELDIT and Word Manager ELDIT and the TreeTagger Literature Conclusion

3 Learners‘ difficulties and needs
Problems with foreign language use Decoding Encoding Problems Syntagmatic level Paradigmatic Semantic

4 PROBLEMS WITH SYNONYMS
AND SIMILAR WORDS convegno riunione incontro assemblea (meeting) Assemblea condominiale (condominium meeting) assemblea d‘affari (business meeting)

5 DIFFICULTIES WITH WORD COMBINATIONS
Collocations fixed combinations of words (arbitrary, unpredictable): Grammatical Constructions formed according to the rules of grammar, partly arbitrary: Ex: to brush one‘s teeth lavarsi i denti sich die Zähne putzen Ex: to ask sb sth chiedere qlco a qlcu jemanden etwas fragen

6 Learners‘ difficulties and needs
Problems with foreign language use Metalanguage Problems with dictionary use Abbreviations Technical terms Other „codes“ Descriptive language Problems Decoding Encoding Syntagmatic level Semantic level Paradigmatic level

7 ABBREVIATIONS Italian German agg. art. tr. determ. pron. femm. ant.
volg. region. mus. sociol. German Adj. Art. tr. best. Pron. w./Fem. veralt. vulg. landsch. Mus. Soziol.. (adjective) (article) (transitive verb) (definite article) (pronoun) (feminine) (archaic) (vulgar) (regional) (music) (sociology)

8 TECHNICAL TERMS grammar language variation aggettivo Adjektiv articolo
ausiliare transitivo determinativo pronome femminile antico volgare dialetto musica sociologia Adjektiv Artikel Hilfsverb transitiv bestimmt Pronomen weiblich veraltet vulgär landschaftlich Musik Soziologie grammar language variation

9 OTHER „CODES“ International Phonetic Alphabet (IPA) or other transcription systems focus shake chiesa [chiè-sa] . Syntactic information (valency) provided in coded or abbreviated form Ex.: (a) geben; [...] Vt j-m etw. g (Langenscheidt) (b) give Vnn (Cobuild) Vn (c) dare N-V-N1 (N2/a N3) (Blumenthal/ Rovere)

10 UNDERSTANDING THE DEFINITION...
„Ich muß im Lexikon nachschlagen, um herauszufinden, was eine Jungfrau ist. [...] Im Lexikon steht, Jungfrau, Frau (gewöhnlich jung), welche sich in einem Zustand unangetasteter Keuschheit befindet und in diesem verbleibt. Jetzt muß ich unangetastet und Keuschheit nachschlagen, und alles, was ich hier finde, ist, daß unangetastet das Gegenteil von angetastet bedeutet, und Keuschheit bedeutet keusch, und das bedeutet frei von gesetzeswidrigem geschlechtlichen Interkursus. Jetzt muß ich Interkursus nachschlagen [...] und ich weiß nicht, was das bedeutet, und ich bin es einfach leid, in dem schweren Lexikon von einem Wort zum anderen geschickt zu werden wie ein Vollidiot, und das alles nur, weil die Leute, die das Lexikon geschrieben haben, nicht wollen, daß unsereins etwas erfährt. Ich will doch nur wissen, wo ich hergekommen bin, aber wenn man jemanden fragt, sagen sie einem, man soll jemand anderen fragen, oder sie schicken einen von Wort zu Wort.“ (McCourt 1998: 412 – 413, dt. Übersetzung)

11 Learners‘ difficulties and needs
Problems with foreign language use Problems with dictionary use Problems with dictionary use Problems with dictionary use Problems Formal Problems Search Presentation Decoding Encoding Metalanguage Syntagmatic level Abbreviations Semantic level Paradigmatic level Technical terms Other „codes“ Descriptive language

12 Problems with searching
Time consuming pages - Small characters - Difficult metalanguage Complex expressions - Collocations (“Zähne putzen”) - Idiomatic expressions

13 Problems with presentation
Limited space Linear presentation order Organisation of the dictionary Organisation of the entries

14

15

16 Learners‘ difficulties and needs
Problems with foreign language use Problems with dictionary use Problems with dictionary use Problems with dictionary use Problems Decoding Encoding Formal Problems Metalanguage Syntagmatic level Search Presentation Abbreviations Semantic level Paradigmatic level Technical terms Solutions Other „codes“ Descriptive language

17 Pedagogical Dictionaries
Target Group: language learners Functions: encoding & decoding General characteristics: (usually) monolingual selective regarding macrostructure (limited number of entries ) exhaustive regarding microstructure (detailled information for each entry)

18 ELDIT Elektronisches Lern(er)wörterbuch Deutsch-Italienisch
Dizionario elettronico per apprendenti Italiano-Tedesco

19

20 Three main characteristics:
1. typologically innovative: a monolingual dictionary (German or Italian): definitions, collocations, idiomatic expressions, examples … in the target language & a bilingual dictionary (German and Italian): translation equivalents, explanations in L1 „cross-lingual“ dictionary German-Italian

21 2. well defined target group:
beginners – intermediate students (Waystage level A1 up to Threshold level B1): basic vocabulary: ~ entry words for each language addressed to the linguistic layman: limited use of meta-language, abbrevations and symbols

22 3. designed solely for computer use:
not a transformation of a paper dictionary into a electronic dictionary exploits the possibilities of the electronic medium (multimedia & hypertext) modular structure: contains detailled informations which you usually find in different types of dictionaries

23 Learners‘ difficulties and needs
Problems with foreign language use Problems with dictionary use Problems with dictionary use Problems with dictionary use Problems Decoding Encoding Formal Problems Metalanguage Syntagmatic level Search Presentation Abbrevations Semantic level Paradigmatic level Electronic search possibilities Hypertext and hyperlinks 1) Sound-files 2) Verb patterns 1) Avoiding 2) Explaining Technical terms 1) Simple 2) Use of L1 3) Multimedia 1) Definitions 2) Examples 3) ... Other „codes“ Solutions Descriptive language

24 SOLUTIONS ... Descriptive language 1. Simple 2. Multiple descriptions
3. Hypertext

25 1. Simple = a) Limited defining vocabulary b) Easy syntax
d) Avoid circularity

26 2. Multiple descriptions =
a) Definitions b) Lexicographic examples c) Word fields d) L1 (semantic equivalents) [e) images]

27 Semantic Level: Semantic information: 1. Definitions Hypernyms
das Gebäude Hypernyms Coordinates Kinds of ... Semantic information: 1. Definitions 2. Examples 3. Word fields 4. Equivalents das Haus, die Villa, das Schloss, die Wohnung ... das Hochhaus, das Bauernhaus ... 1.a) Ein Haus ist ein Gebäude, in dem Menschen wohnen casa Sie wohnt mit ihrer Familie in einem zweistöckigen Haus am Stadtrand. b) Ein Haus ist das Gebäude, in dem man ständig lebt und in das man regelmäßig zurückkehrt. Es ist der Ort, wo man daheim ist. Sie verlässt das Haus jeden Morgen um sieben Uhr, um zur Arbeit zu fahren. 2. Das Haus sind die Bewohner eines Hauses (1a). casa ....

28 3. Hypertext = a) Click on unknown words inside the definition
b) Click on the semantic equivalents c) Click on any information you‘re interested in

29 Learners‘ difficulties and needs
Problems with foreign language use Problems with dictionary use Problems with dictionary use Problems with dictionary use Problems Decoding Encoding Formal Problems Metalanguage Syntagmatic level Search Presentation Abbrevations Semantic level Paradigmatic level 1) Collocations 2) Examples 3) ... Technical terms 1) Avoiding 2) Explaining Electronic search possibilities Other „codes“ Solutions 1) Avoiding 2) Explaining Descriptive language Hypertext and hyperlinks 1) Sound-files 2) Verb patterns 1) Definitions 2) Examples 3) ... 1) Simple 2) Use of L1 3) Multimedia

30 Syntagmatic level: 2. Idiomatic Expressions 3. Verb Valency
1. Collocations 2. Idiomatic Expressions 3. Verb Valency

31 Verb Valency Definition: “Valency refers to the capacity of a verb to take a specific number and type of arguments” (Bianco) Theoric origin: dependency grammar (Lucien Tesnière)

32 Verb Valency: a problem for learners and researchers
verb constructions are largely arbitrary and unpredictable number of obligatory and facultative elements distinction between transitivity and intransitivity

33 The description of verb valency in different dictionary types
General monolingual dictionaries fragen: [jemdn.] unvermittelt, ... etw. fragen Duden Deutsches Universalwörterbuch chiedere: v.tr. (2 argom.) Disc chiedere: v.tr. Devoto/Oli

34 The description of verb valency in different dictionary types
2. Special mono- and bilingual verb valency dictionaries fragen: 01a v 1b C Bianco chiedere: N- V- N1 (N2/a N3) Blumenthal/Rovere

35 The description of verb valency in different dictionary types
3. (Monolingual) learners‘ dictionaries fragen: Vt/i (j-n) (etw.) f. Langenscheidt fragen: tr K jd fragt jdn [nach etw dat] Pons Basiswörterbuch chiedere: tr. Dib

36 Description of Verb Valency in ELDIT
I. Learner friendly description: Explicit way of describing verb valency N-V-N1-(N2) v.tr. (2 argom.) Vt/i (etw.) (über j-n/etw.) r.

37 Description of Verb Valency in ELDIT
II. Multimedia: Visualization of information to support comprehension (colors and animations instead of meta-language)

38 Description of Verb Valency in ELDIT
III. Semiotic didactics: Functions of the different colors: they indicate the parts of the sentence they show which parts of the verbs belong together correspondence between patterns and examples

39 Description of Verb Valency in ELDIT
IV. Additional explanations for the learner: Visible notes to describe semantic restrictions Variations for realizing single parts of the sentence

40 Learners‘ difficulties and needs
Problems with foreign language use Problems with dictionary use Problems with dictionary use Problems with dictionary use Problems Decoding Encoding Formal Problems Metalanguage Syntagmatic level Search Presentation Abbreviations Semantic level Paradigmatic level Technical terms Lexical fields Three dimensional graphics 1) Avoiding 2) Explaining Electronic search possibilities Other „codes“ Solutions 1) Avoiding 2) Explaining Descriptive language 1) Collocations 2) Examples 3) ... Hypertext and hyperlinks 1) Sound-files 2) Verb patterns 1) Definitions 2) Examples 3) ... 1) Simple 2) Use of L1 3) Multimedia

41 PARADIGMATIC RELATIONS
Word field theory: „Ein Wortfeld ist eine Gruppe von Wörtern, die inhaltlich einander eng benachbart sind und die sich vermöge Interdependenz ihre Leistungen gegenseitig zuweisen.“ (Trier 1968/1973: 189, späte Def.) Existing Projects - WordNet (GermaNet, Italian WordNet) - Alexia - Kirrkirr

42 Paradigmatic relations in ELDIT
Ca. 150 words per language interactive graphic representation spacial arrangement and colors for the representation of paradigmatic lexical relations explicit description of the semantic relations between the lexical units and the lemma (no metalanguage) definitions and examples for describing similarities/differences of meaning, register, authentic context

43 Lexical fields in ELDIT
Type of meaning relations: hierachical relations (hyperonymy/hyponymy; holonymy/meronymy) non-hierarchical relations (similarity: synonyms, quasi-synonyms … - contrast: gradable and nongradable antonyms; converse terms)

44

45

46

47 Learners‘ difficulties and needs
Problems with foreign language use Problems with dictionary use Problems with dictionary use Problems with dictionary use Problems Decoding Encoding Formal Problems Metalanguage Syntagmatic level Search Presentation Abbreviations Semantic level Paradigmatic level Technical terms 1) Avoiding 2) Explaining Electronic search possibilities Other „codes“ Solutions 1) Avoiding 2) Explaining Descriptive language 1) Collocations 2) Examples 3) ... Hypertext and hyperlinks 1) Sound-files 2) Verb patterns 1) Definitions 2) Examples 3) ... Three dimensional graphics 1) Simple 2) Use of L1 3) Multimedia

48 Other modules Flexion Word family N.B.

49 Datamodel Needs for an innovative presentation

50 A detailed data model

51

52 Implementation Hierarchical structured data Many changes were expected
Communication with linguists

53 Use of XML XML und XML-Editor Java-Servlet Technology DXML or JDOM
Hierarchic Structure Communication with Linguists Java-Servlet Technology DXML or JDOM Dynamic Generation of HTML

54 Content Authoring Content Authoring In ELDIT: Difficult Time consuming
Error prone In ELDIT: Innovative Presentation Efficient Interface (Real World System) Research of Linguists

55 “Efficient” Authoring Interface

56 Efficient Authoring Interface
Semi-structured Data Automatic full-structuring Automatic enriching

57 Semi-structured Data

58

59 Automatic full-structuring
<example> <w>Meine</w> <w>Eltern</w> <w style="emphasized">haben</w> <w style="emphasized">das</w> <w style="emphasized">Haus</w> <w>vor</w> <w>50</w> <w>Jahren</w> <w style="emphasized">gebaut</w> <w>.</w> </example> <prebasuf> <article>die</article> <praefix>Be</praefix> <basis>haus</basis> <suffix>ung</suffix> </prebasuf>

60 Automatic Enriching By using Computational Linguistics tools
WordManager TreeTagger PhraseManager, WordNet, Parser, …

61 <derivation> <prebasuf>die Be_haus_ung</prebasuf> <translation>la dimora</translation> </derivation>

62 <derivation id="de.n.haus.1.deriv2">
<pattern id="de.n.haus.1.deriv2.patt0" base="Behausung" ctag="N" lexref=""> <article base="der" ctag="art" lexref="de.g.articles.1.item1">die</article> <praefix explref="de.prae.h.be">Be</praefix> <basis>haus</basis> <suffix explref="de.suff.h.ung">ung</suffix> </pattern> <translation id="de.n.haus.1.deriv2.trans0"> <w id="de.n.haus.1.deriv2.trans0.w0" type="content" base="il" ctag="art" lexref="it.g.articles.1.item2">la</w> <w id="de.n.haus.1.deriv2.trans0.w1" base="dimora" ctag="N" lexref="it.n.dimora.1">dimora</w> </translation> </derivation>

63 ELDIT and WordManager WordManager WM Transducers WordManager in ELDIT

64 WordManager - 1992 System for reusable morphological dictionaries
Information of a word about Flexion (Declination and Conjugation) Word formation (Derivation and Composition) Orthography (Old and new for German) German, Italian, English

65 WM Transducers - 2000 Lemmatizer Häusern → haus (Cat N)
Inflection Analyzer Häusern → haus (Cat N)(Gender N)(Num PL)(Case Dat) Inflection Generator Haus → haus (Cat N)(Gender N)(Num SG)(Case Nom), haus (Cat N)(Gender N)(Num SG)(Case Gen), häuser (Cat N)(Gender N)(Num PL)(Case Nom), häusern (Cat N)(Gender N)(Num PL)(Case Dat) Word Formation Analyzer kennenlernen → kennen (Cat V)(Aux haben) lernen (Cat V)(Aux haben) Word Formation Generator bosco → abbracciabosco (Cat N)(Gen M) boscaglia (Cat N)(Gen F) boscaiolo (Cat N)(Gen M)

66 WM in ELDIT Search (Lemmatizer)

67 Links and Additional Examples (Lemmatizer)

68 Exercises (Analyzer)

69 Conjugation tables (Generator)

70 ELDIT and TreeTagger Development Tagging Manual Corrections
ELDIT Text Corpus Development Tagging Manual Corrections

71 ELDIT Texts

72 Development MSWord (Goethe Institut of Milan) HTML Simple XML

73 Tagging POS – tagging (→ TreeTagger) XML with links Iterative Correction by frequency of unlinked words

74 Corrections Old German spelling rules valid until 1998
The Italian verb “sono” (they are) was always tagged with “sonare” (=suonare, make music) instead of with “essere” (to be). The verb “sia” (he may be) was always recognized as a conjunction and tagged with “sia” (as well as) instead of with “essere” (to be). Many conjugated forms of “avere” were tagged with “riavere” (to get something back) instead of with “avere” (to have). Many conjugated forms of “andare” were tagged with “riandare” (to go back) instead of with “andare”. Abbreviated forms of Italian words (such as “bel”, “vuol”, “pur”, “fin”) were tagged as nouns and with the original form as lemma. Some Italian words which exist both as nouns and as past participles (such as the word “successo” (the success, it happened)) were tagged with the wrong word class.

75 Literature http://www.eurac.edu/about/collaborators/JKnapp/index.htm
→ Publications (some linguistic ones, too) → PhD-Thesis (Abel Andrea – Uni Innsbruck; Judith Knapp – Uni Hannover)

76 Conclusion syntagmatisch, paradigmatisch, pragmatisch, Polysemie, Homographie, Homonymie, Holonymie, Hyponymie, Hyperonymie, semiotisch, ludativ, … Fileserver, Webserver, Datenmodell, HTTP request, Client, Protokoll, Port, … +∞ ∫√∂u∆v - ∞ Goal based scenarios, blended learning … TEI, CES, NLP, Lemmatizing, POS-Tagging …


Download ppt "Lexicography and computer science: a harmless drudgery?"

Similar presentations


Ads by Google