Presentation is loading. Please wait.

Presentation is loading. Please wait.

From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica

Similar presentations


Presentation on theme: "From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica"— Presentation transcript:

1 From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica http://cwn.ling.sinica.edu.tw/huang/huang.htm

2 2007.03.09ISLCC Chu-Ren Huang Outline A generative lexicalist approach to grammar From distributional data to the basic contrasts in a semantic field (or conceptual motivation for corpus distribution) Lexical distribution as cognitive model Radical as ontology Language as a knowledge system

3 2007.03.09ISLCC Chu-Ren Huang Introduction: A generative lexicalist approach to grammar Back to Aristotle (through Pustejovsky) How do know and know and what do we know: through what we experience Qualia Structure: what we experience Formal Constitutive Agentive Telic

4 2007.03.09ISLCC Chu-Ren Huang Linguistics: What do we know about language Qualia Structure of Theory of Language Formal: from Sign to Structure, Structuralism Constitutive: from IA to IP, rule and transformation based theories Agentive: UG approaches Telic: Function and Use based Theories We need a linguistic theory that accounts for the complete knowledge structure, not just its individual aspects

5 2007.03.09ISLCC Chu-Ren Huang Towards Language as Knowledge System Atoms of knowledge : lexicalized concepts ‘frames’ of knowledge : lexical semantic relations Instantiation of knowledge : corpus  lexicon-driven, corpus-based  to infer knowledge structure underlying linguistic structure

6 2007.03.09ISLCC Chu-Ren Huang Three Studies The semantic field of emotion: ( elaborated from Chang et al. 2000 ) Lexicalized Model of Cognition: (Huang and Hong 2005) Conventionalized Ontology in Writing ( Chou and Huang 2005 )

7 2007.03.09ISLCC Chu-Ren Huang Semantic Field of Verbs of Emotion Issues: Methodological Interpretation of Distributional Data Measuring and Interpreting lexical choices Issues: Linguistic Archetype Via Contrast Why Change-of-State: Saliency and relevance to human cognition

8 2007.03.09ISLCC Chu-Ren Huang Distributional Contrast of Verbs of Emotion 高興 gao1xing4 (Type A) Vs. 快樂 kuai4le4 (Type B) Category: intrans. vs. trans. state verb Function: more predicative vs. more nominalized Collocation: CAUSE complement vs. no CAUSE Collocation: Perfect aspect vs. no -le Collocation (modified nouns): Eventive vs. no selection Interpretation (Imperative): Command vs. Wish

9 2007.03.09ISLCC Chu-Ren Huang A Natural Dichotomy of Verbs of Emotion Subtype Type AType B Happiness gao1xing4 高興 (669) kuai4le4 快樂 (942) kai1xin1 開心 (152) yu2kuai4 愉快 (271) tong4kuai4 痛快 (40) xi3yue4 喜悅 (156) huan1le4 歡樂 (141) huan1xi3 歡喜 (107) kuai4huo2 快活 (48) Depression nan2guo4 難過 (232) Tong4ku3 痛苦 (443) tong4xin1 痛心 (48) chen2zhong4 沈重 (83) ju3sang4 沮喪 (62)

10 2007.03.09ISLCC Chu-Ren Huang A Natural Dichotomy of Verbs of Emotion Subtype Type AType B Sadness hang1xin1 傷心 (134) bei1shang1 悲傷 (52) Regret hou4hui3 後悔 (102) yi2han4 遺憾 (198) Anger seng1qi4 生氣 (307) fen4nu4 憤怒 (112) qi4fen4 氣憤 (49) Fear hai4pa4 害怕 (261) kong3ju4 恐懼 (149) wei4ju4 畏懼 (40) Worry dan1xin1 擔心 (609) fan2nao3 煩惱 (199) dan1you1 擔憂 (64) ku3nao3 苦惱 (45) you1xin1 憂心 (46)

11 2007.03.09ISLCC Chu-Ren Huang Some Observations Each of the seven kinds of emotion verbs show the same dichotomy: change-of-state vs. homogeneous state Each side of the dichotomy is dominated by a dominating verb in terms of frequency and prototypicality of meaning

12 2007.03.09ISLCC Chu-Ren Huang Semantic Field and Contrast Set A semantic field is consisted of a unique covering term and a number of contrast sets. Paraphrase of Grandy 1992 The unique covering term may or may not occur in a contrast set. All other members of the semantic field must be determined by entering into a contrast set relation with a known member of the semantic field.

13 2007.03.09ISLCC Chu-Ren Huang Observation: Chinese Defines a Property by Contrast qing1zhong4 light+heavy = weight da4xiao3 big+small = size gao1ai3 tall+short = height shi4fei1/dui4cuo4 right+wrong = affair xiong1di4 elder+younger = brothers zang1pi3 praise+attack = criticize hu1xi1 exhale+inhale = breathe

14 2007.03.09ISLCC Chu-Ren Huang Our Proposal T is either a single term or a privileged contrast set, called a contrast pair. When T is a contrast pair, the semantic field can be defined by the shared semantic properties of the pair. The fundamental contrast relation defining a contrast pair may be shared by a super- set of semantic fields.

15 2007.03.09ISLCC Chu-Ren Huang Our Proposal T must enter contrast set relations with other members of the semantic field, although the contrast relation may be weakened to a marked/unmarked contrast. The set of fundamental contrast relations are shared by all semantic fields. [cf. Semantic relations]

16 2007.03.09ISLCC Chu-Ren Huang Patterns of Distribution as Representational Clues Numbers Don’t Lie The pattern itself is a proof that generalizations based on a single lexical item is replicable. The uniformity and universality of the pattern across a broad but contiguous semantic field strongly favors a conceptual motivation.

17 2007.03.09ISLCC Chu-Ren Huang Functional Distribution of Type A Verbs of Emotion Type A Pred.Nom.N.M. gao1xing4 85.05%0.30%1.35% nan2guo4 86.64%2.16%2.59% shang1xin1 76.12%2.99%11.19% hou4hui3 94.12%0.00%2.94% sheng1qi4 87.82%0.00%4.06% hai4pa4 93.10%3.07%2.68% dan1xin1 96.72%1.97%1.31% Average 88.51%1.50%3.73%

18 2007.03.09ISLCC Chu-Ren Huang Functional Distribution of Type B Verbs of Emotion Type B Pred.Nom.N.M. kuai4le4 37.79%26.43%24.84% tong4ku3 25.73%45.60%20.54% bei1shang1 40.38%28.85%19.23% yi2han4 34.85%33.84%3.54% fen4nu4 28.57%37.50%17.86% kong3ju4 23.49%68.46%7.38% fan2nao3 24.12%69.85%6.03% Average 30.70%44.36%14.21%

19 2007.03.09ISLCC Chu-Ren Huang Preference of A verbs over B verbs in Predicative Uses Verbs Pred.-Freq. A/B Ratio gaoxing / kuaile 569/3561.59 nanguo / tongku 201/1141.76 shangxin / beishang 102/214.86 houhui / yihan 96/691.39 shengqi / fennu 238/327.44 haipa / kongju 243/356.94 danxin / fannao 589/4812.27 Average ratio 5.62

20 2007.03.09ISLCC Chu-Ren Huang Preference of B verbs over A verbs in Nominal Uses Verbs Nom.-Freq. B/A Ratio gaoxing / kuaile 11/48343.91 nanguo / tongku 11/29326.64 shangxin / beishang 19/25 1.32 houhui / yihan 3/7424.67 shengqi / fennu 11/62 5.64 haipa / kongju 15/113 7.53 danxin / fannao 20/151 7.55 Average ratio16.75

21 2007.03.09ISLCC Chu-Ren Huang Summary of the Likelyhood Ratio Data A clear lexical preference between near- synonyms are established. Predicative preference and deverbal preference tend to compensate each other to establish contrast. Overall, the deverbal preference seems to be the defining feature of the dichotomy. [ note that these are all verbs.]

22 2007.03.09ISLCC Chu-Ren Huang Deverbal Use Frequency of Type A Verbs tong4kuai4 痛快 0.00% gao1xing4 高興 1.65% hou4hui3 後悔 2.94% dan1xin1 擔心 3.28% sheng1qi4 生氣 3.58% tong4xin1 痛心 4.17% nan2guo4 難過 4.75% hai4pa4 害怕 5.75% you1xin1 憂心 6.52% kai1xin1 開心 7.89% dan1you1 擔憂 9.38% shang1xin1 傷心 14.18%

23 2007.03.09ISLCC Chu-Ren Huang Deverbal Use Frequency of Type B Verbs qi4fen4 氣憤 24.49% chen1zhong4 沈重 48.19% wei4ju4 畏懼 25.00% kuai4le4 快樂 51.27% yu2kuai4 愉快 29.89% fen4nu4 憤怒 55.36% huan1xi1 歡喜 30.84% tong4ku3 痛苦 66.14% kuai4huo2 快活 33.33% kong3ju4 恐懼 75.84% ju3sang4 沮喪 33.87% fan2nao3 煩惱 75.88% yi2han4 遺憾 37.38% xi1yue4 喜悅 92.20% ku3nao3 苦惱 46.67% huan1le1 歡樂 92.91% bei1shang1 悲傷 48.08%

24 2007.03.09ISLCC Chu-Ren Huang Deverbal Use Frequency as a Benchmark for Type A/B Verbs More than 10% differentiates the lowest Type B verb (qi4fen4 氣憤 24.49%) from the highest Type A verbs ( shang1xin1 傷心 14.18%). The smallest gap between a competing pair is almost 34% ( shang1xin1 傷心 14.18% vs. bei1shang1 悲 傷 48.08% ).

25 2007.03.09ISLCC Chu-Ren Huang The Noisy-Channel Model of Theory of Communication Our Proposal Language is an information-based communication system. An optimized communication system is where all redundant signs (for one piece of information) also minimally differentiate another piece of information.

26 2007.03.09ISLCC Chu-Ren Huang Re-Interpretation of the Data Members of the same semantic field in general, and a near-synonym pair in particular, are competing signs to express information pertaining to the field. A sign is chosen to represent a piece of information because it expresses that piece of information most effectively.

27 2007.03.09ISLCC Chu-Ren Huang Re-Interpretation of the Data This preference for expressing certain information can be lexicalized to establish logical implicature. Once that lexical preference is established, linguists could use the preferential ratio to infer the lexical information being carried.

28 2007.03.09ISLCC Chu-Ren Huang Lexical distribution as cognitive model: Senses A further step based on property defined by contrast, with focus on how senses are represented Study the sense of hearing and the basic property term of sheng-yin ‘sound/voice’ We (Huang and Hong 2005) look at the distribution of these two lexical elements in all derived words

29 2007.03.09ISLCC Chu-Ren Huang 聲 Sheng vs. 音 Yin 聲樂 vs. 音樂 vocal music vs. music 發聲 vs. 發音 make a sound vs. articulate 高聲 vs. 高音 loudly vs. high pitch *噪聲 vs. 噪音 noise 大聲 vs. *大音 loudly

30 2007.03.09ISLCC Chu-Ren Huang NN Compound N+* 聲 Sheng +source 歌 掌 人 腳步 風 鐘 水 … 音 Yin + quality 嗓 鄉 喉 裝飾 尾 哨 …

31 2007.03.09ISLCC Chu-Ren Huang The semantic Contrast 聲 Production of sounds Often refers to the manner or source of haw a sound was made 音 Perception of a sound Often refers to the sound quality or how a sound is perceived by an intelligent agent

32 2007.03.09ISLCC Chu-Ren Huang A Lexicalized Schema for Hearing in Chinese From Huang and Hong 2005 Process of Hearing 聲 sheng 音 yin 起點、來源 source 終點、結果 goal 主動完成 production 被動接收 reception   發動者 (instigator ) 經驗者 (experiencer)

33 2007.03.09ISLCC Chu-Ren Huang A Lexicalized Schema for Sense in Chinese Process of Sensation word1word0 word1 word0 經驗者 (experiencer) Goal/perceptiopn: experience of sense 感知接收 (sensation) 

34 2007.03.09ISLCC Chu-Ren Huang 詞彙詞義分析 (7) 「視覺」、「觸覺」與「聽覺」三者的關係圖示 特徵 詞彙 認知特徵的對比 感覺發動者 (instigator of action) — marked 感覺經驗者 (experiencer of sensation) — shared and unmarked 聽覺聲 (production) 音 (perception) 視覺看 (inchoative) 見 (bounded result) 觸覺觸 (activity) 摸 (incremental theme) perception

35 2007.03.09ISLCC Chu-Ren Huang Radical as ontology Chinese writing system has been conventionalized and shared for over three thousand years And adopted by typologically very different languages If the radical system is a system of conceptualization, then it is the most robust and most widely used ontology

36 2007.03.09ISLCC Chu-Ren Huang Example: the horse radical (from Chou 2005) 馬 is a semantic symbol of horse Examples: 驩 : 馬名 a kind of horse 驫 : 眾馬 horses 騎 : 騎馬 riding a horse 驍 : 良馬 a good horse 驚 : 馬驚 a scared horse 馬

37 2007.03.09ISLCC Chu-Ren Huang Research Tool and Issue Formal Description IEEE SUMO ( Suggested Upper Merged Ontology) http://www.ontologyportal.org http://BOW.sinica.edu.tw Issue: Why Chinese radicals are usually considered as a imperfect and misleading taxonomy? Issue: Why Chinese radicals are usually considered as a imperfect and misleading taxonomy?

38 2007.03.09ISLCC Chu-Ren Huang Knowledge System of the Radical 艸 / 艹 (Grass, for Plants) 蕃藥蔬菜薪 苑藩藉茭 萌莖芽茄 苗蓮葉 蕉蘭芒蒙菌蔓 苦菊茱范荷茅 蕈蔚菲草 Parts Description Usage Plants IS-A Constitutive Descriptive/ formal telic 茲蒼芳落 茸茂荒薄 芬蒸莊

39 2007.03.09ISLCC Chu-Ren Huang Conclusion I: Corpus as Evidence Core issue of a scientific explanation of language and cognition Language as an living organism allows variations and adaptations (the evolutionary view) The coherence of language is the shared tendency of all users Distributional data in corpus lead to discovery of these shared tendencies This should be more valuable than incidental example

40 2007.03.09ISLCC Chu-Ren Huang Conclusion II: Language as a Knowledge System The generative lexicalist approach to grammar: language as a knowledge system All aspects of Language are projected from a unified knowledge system Lexical semantics based on distributional data offers the best window to the underlying knowledge system of language


Download ppt "From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica"

Similar presentations


Ads by Google