Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHARM Lecture 1 Outline of the Problem. The Maltese Alphabet A aB bĊ ċD dE eF fĠ ġG gGħ għH h abeċedeeefġegeajnakka Ħ ħI iIe ieJ jK kL lM mN nO oP p ħeiiejekeelleemmeenneope.

Similar presentations


Presentation on theme: "CHARM Lecture 1 Outline of the Problem. The Maltese Alphabet A aB bĊ ċD dE eF fĠ ġG gGħ għH h abeċedeeefġegeajnakka Ħ ħI iIe ieJ jK kL lM mN nO oP p ħeiiejekeelleemmeenneope."— Presentation transcript:

1 CHARM Lecture 1 Outline of the Problem

2 The Maltese Alphabet A aB bĊ ċD dE eF fĠ ġG gGħ għH h abeċedeeefġegeajnakka Ħ ħI iIe ieJ jK kL lM mN nO oP p ħeiiejekeelleemmeenneope Q qR rS sT tU uV vW wX xŻ żZ z qeerreesseteuveweexxeżezej The Problem 1 We will refer to ordinary characters that could yield Maltese characters as charms

3 The Problem 2 from KullĦadd FIL-KRIZI li ghandna fit-turizmu fil-gzejjer taghna l-aghar li qed jintlaqtu huma l-lukandi tal tliet stilel. L-ahhar studju li sar mid-Deloitte ghall-Assocjazzjoni Maltija tal-Lukandi u Ristoranti jghidilna kif in-nuqqas tal turisti u z-zieda fl- ispejjez ghal dawn il-lukandi fissru li ghamlu telf tal 19.8% fir-rata tal qliegh taghhom u fosthom kien hemm min salva biss anki fl-aqwa tas-sajf permezz tal l-istudenti. L-istess studju juri li 70% tas-sidien tal dawn il-lukandi jibzghu li se jkomplu jbatu min-nuqqas tal turisti u se jkollhom hafna kmamar vojta fix-xhur li gejjin.

4 The Problem 3 Is there some way in which we can recover the special Maltese characters automatically? If so 1.What is the underlying algorithmic model? 2.What knowledge must the programme bring to bear? 3.What resources are needed to build the knowledge base?

5 Noisy Channel Model for Sentence Translation (Brown et. al. 1990) source sentence target sentence sentence diagram from Jurafsky & Martin

6 Algorithmic Model Noisy channel model is domain independent. Brown applied it to the domain of translation from source language to target language. We can use it for the domain of words.

7 Noisy Channel at Word Level NOISY CHANNEL KullĦadd source KullHadd target

8 Main Algorithm: Four Steps 1.See target word t 2.Generate the set S of all possible source words for that word. 3.Pick the most probable source word s in S 4.Output s

9 Step 1: See Target Word Preprocessing –noise –case –punctuation –hyphen Tokenisation –words –numbers –other

10 Step 2 Generate S If t contains charms generate S = {s | forall 0 < i <= len(t) s[i] = t[i] \/ s[i] = m(t[i]) }

11 Step 3 Pick the most probable source word s in return argmax(P(s)) for s in S This is covered in lecture 2


Download ppt "CHARM Lecture 1 Outline of the Problem. The Maltese Alphabet A aB bĊ ċD dE eF fĠ ġG gGħ għH h abeċedeeefġegeajnakka Ħ ħI iIe ieJ jK kL lM mN nO oP p ħeiiejekeelleemmeenneope."

Similar presentations


Ads by Google