Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSS 590 C: Introduction to NLP

Similar presentations


Presentation on theme: "CSS 590 C: Introduction to NLP"— Presentation transcript:

1 CSS 590 C: Introduction to NLP
Yuval Marton Spring 2017 Much of the materials was borrowed from and/or inspired by course slides of Richard Socher and Chris Manning (Stanford 2016/2017), Kevin Gimpel (TTIC 2016), Chris Callison-Burch (Upenn 2014) & Nizar Habash (Columbia 2013).

2 CSS 590 C: Introduction to NLP
Yuval Marton Spring 2017 March 27: Introduction to NLP/NLU via MT

3 Why (Machine) Translation?
Languages in the world 6,000+ live languages 600 with written tradition 100 languages are spoken by 95% of world population Translation Market $26 Billion Global Market (2010) Doubling every five years (Donald Barabé, invited talk, MT Summit 2003) March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

4 Where? Multilingualism, Language Families
March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

5 CSS 590 -- class 1: Intro to NLP/NLU
Language Families Wikipedia / "Primary Human Language Families Map" by PiMaster3 - Own work. March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

6 Translation Challenges – Multilingual Divergence
March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

7 CSS 590 -- class 1: Intro to NLP/NLU
Shatt Al-Arab Fresh Fish Not understanding language can lead to funny results… March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

8 Multilingual Challenges
nai you duo shi means buttered toast naiyou means butter duoshi means toast duo means many shi can mean private (as in the army rank) March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

9 CSS 590 -- class 1: Intro to NLP/NLU
Lost in translation What is this sign about? (answer only if you don’t understand the top two languages) Machine translation Great when works Can fail miserably What’s wrong here? Lex sem (word level meaning) Semantics (meaning of topic) Syntax [Answer: swimming pool regulations] March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

10 Affluenza How would you translate ‘affluenza’?
Affluenza, a portmanteau of affluence and influenza, is a term used by critics of consumerism. It is thought to have been first used in 1954 but it gained legs as a concept with a 1997 PBS documentary of the same name and the subsequent book, Affluenza: The All-Consuming Epidemic (2001). These works define affluenza as "a painful, contagious, socially transmitted condition of overload, debt, anxiety, and waste resulting from the dogged pursuit of more."The term "affluenza" has also been used to refer to an inability to understand the consequences of one's actions because of financial privilege… [Wikipedia] March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

11 Multilingual Challenges
Orthographic Variations Ambiguous spelling كتب الاولاد اشعارا كَتَبَ الأوْلادُ اشعَاراً Ambiguous word boundaries Lexical Ambiguity Bank  بنك (financial) vs. ضفة (river) Eat  essen (human) vs. fressen (animal) My work on palestinian, arabic mt and arabic hebrew mt Highlight similarities and differences A lot of similarities/differences not included March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

12 Multilingual Challenges Morphological Variations
Affixational (prefix/suffix) vs. Templatic (Root+Pattern) write written كتب مكتوب kill killed قتل مقتول do done فعل مفعول Tokenization (aka segmentation+normalization) conj noun plural article And the cars and the cars والسيارات w Al SyArAt Et les voitures et le voitures March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

13 Morphology يقرأ الطالب المجتهد كتابا عن الصين في الصف
read the-student the-diligent a-book about china in the-classroom the diligent student is reading a book about china in the classroom 这位勤奋的学生在教室读一本关于中国的书 this quant diligent de student in classroom read one quant about china de book Arabic: very rich morphology: number, gender, case, person, aspect, voice, several clitics, etc. Arabic tokenization English: simple morphology Chinese: no morphology – quantifiers & verbal aspects March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

14 Multilingual Challenges Morphological Variations
Affixational (prefix/suffix) vs. Templatic (Root+Pattern) write written كتب مكتوب kill killed قتل مقتول do done فعل مفعول Tokenization (aka segmentation+normalization) conj noun plural article And the cars and the cars والسيارات w Al SyArAt Et les voitures et le voitures March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

15 Syntax يقرأ الطالب المجتهد كتابا عن الصين في الصف 这位勤奋的学生在教室读一本关于中国的书
read the-student the-diligent a-book about china in the-classroom V S O PP-mod-V the diligent student is reading a book about china in the classroom S V O PP-mod-V 这位勤奋的学生在教室读一本关于中国的书 this quant diligent de student in classroom read one quant about china de book S PP-mod-V V O Arabic English Chinese Subj-Verb V Subj Subj V Subj … V Verb-PP V…PP V PP PP V Adjectives N Adj Adj N Adj de N Possessives N Poss N of Poss Poss ’s N Poss de N Relatives N Rel Rel de N March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

16 Syntax يقرأ الطالب المجتهد كتابا عن الصين في الصف 这位勤奋的学生在教室读一本关于中国的书
read the-student the-diligent a-book about china in the-classroom the diligent student is reading a book about china in the classroom 这位勤奋的学生在教室读一本关于中国的书 this quant diligent de student in classroom read one quant about china de book Arabic English Chinese Subj-Verb V Subj Subj V Subj … V Verb-PP V…PP V PP PP V Adjectives N Adj Adj N Adj de N Possessives N Poss N of Poss Poss ’s N Poss de N Relatives N Rel Rel de N

17 Multilingual Divergences
conflation لست am suis هنا I not here Je ne pas ici لست هنا I-am-not here I am not here Je ne suis pas ici I not am not here March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

18 Multilingual Divergences categorial, thematic and structural
* be tener * ا نا بردان I cold Yo frio קר ל אני thematic انا بردان I cold I am cold tengo frio I-have cold קר לי cold for-me March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

19 Multilingual Divergences head swap and categorial
swim I quickly across river اسرع انا سباحة عبور نهر I swam across the river quickly اسرعت عبور النهر سباحة I-sped crossing the-river swimming March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

20 Multilingual Divergences head swap and categorial
swim I quickly across river חצה אני ב את נהר שחיה מהירות I swam across the river quickly חציתי את הנהר בשחיה במהירות I-crossed obj river in-swim speedily March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

21 Multilingual Divergences head swap and categorial
חצה אני ב את נהר שחיה מהירות verb اسرع انا سباحة عبور نهر verb noun noun swim I quickly across river verb noun noun prep adverb March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

22 Multilingual Divergences Orthography+Morphology+Syntax
mom’s car car mom possessed-by 妈妈的车 mama de che سيارة ماما sayyArat mama la voiture de maman March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

23 CSS 590 -- class 1: Intro to NLP/NLU
L in 10 To get a glimpse of all that linguistic richness, we will dedicate ~10 minutes each class to learn some facts about a new language March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

24 Multilingual Solutions
(As of 2014?) Google offers translations between the following languages  over 3,000 pairs Afrikaans Albanian Arabic Armenian Azerbaijani Basque Belarusian Bulgarian Catalan Chinese Croatian Czech Danish Dutch English Estonian Filipino Finnish French Galician Georgian German Greek Haitian Creole Hebrew Hindi Hungarian Icelandic Indonesian Irish Italian Japanese Korean Latvian Lithuanian Macedonian Malay Maltese Norwegian Polish Portuguese Romanian Russian Serbian Slovak Slovenian Spanish Swahili Swedish Thai Turkish Ukrainian Urdu Vietnamese Welsh Yiddish March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

25 Free Machine Translation !
Can1 Ting1 “dining hall” March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

26 CSS 590 -- class 1: Intro to NLP/NLU
MT in SciFi Silicon-based (Star Trek), neural / carbon-based (The Hitchhiker’s Guide to the Galaxy) March 27, Yuval Marton CSS class 1: Intro to NLP/NLU

27 CSS 590 -- class 1: Intro to NLP/NLU
MT in Present ( ) Military / Gov (then and now) Industry (aid for human translators – finally profitable!) Personal (Google Translate, Skype Translator, …) March 27, Yuval Marton CSS class 1: Intro to NLP/NLU


Download ppt "CSS 590 C: Introduction to NLP"

Similar presentations


Ads by Google