Presentation is loading. Please wait.

Presentation is loading. Please wait.

TAMIL SOFTWARE DEVELOPMENT Dr.S.Suganthi Assistant Professor Department of Computer Science G.Venkataswamy Naidu College, Kovilpatti.

Similar presentations


Presentation on theme: "TAMIL SOFTWARE DEVELOPMENT Dr.S.Suganthi Assistant Professor Department of Computer Science G.Venkataswamy Naidu College, Kovilpatti."— Presentation transcript:

1 TAMIL SOFTWARE DEVELOPMENT Dr.S.Suganthi Assistant Professor Department of Computer Science G.Venkataswamy Naidu College, Kovilpatti

2 Agenda FONT INPUT LAYOUT ENCODING NATURAL LANGAUGE PROCESSING MACHINE TRANSLATION CHARACTER RECOGNITION

3 What is NLP  Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages. ML in NLP3

4 Go beyond the keyword matching  Identify the structure and meaning of words, sentences, texts and conversations  Deep understanding of broad language  NLP is all around us ML in NLP4

5 Machine translation Facebook translation, image credit: Meedan.org ML in NLP5

6 Statistical machine translation Image credit: Julia Hockenmaier, Intro to NLP ML in NLP6

7 Dialog Systems ML in NLP7

8 Sentiment/Opinion Analysis ML in NLP8

9 Text Classification  Other applications? www.wired.com ML in NLP9

10 Question answering credit: ifunny.com 'Watson' computer wins at 'Jeopardy' ML in NLP10

11 Question answering  Go beyond search ML in NLP11

12 Natural language instruction https://youtu.be/KkOCeAtKHIc?t=1m28s ML in NLP12

13 Digital personal assistant More on natural language instruction  Semantic parsing – understand tasks  Entity linking – “my wife” = “Kellie” in the phone book credit: techspot.com ML in NLP13

14 Information Extraction  Unstructured text to database entries Yoav Artzi: Natural language processing ML in NLP14

15 Challenges – ambiguity  Word sense ambiguity ML in NLP15

16 Challenges – ambiguity  Word sense / meaning ambiguity Credit: http://stuffsirisaid.comhttp://stuffsirisaid.com ML in NLP16

17 Challenges – ambiguity  PP attachment ambiguity Credit: Mark Liberman, http://languagelog.ldc.upenn.edu/nll/?p=17711http://languagelog.ldc.upenn.edu/nll/?p=17711 ML in NLP17

18 Challenges -- ambiguity  Ambiguous headlines:  Include your children when baking cookies  Local High School Dropouts Cut in Half  Hospitals are Sued by 7 Foot Doctors  Iraqi Head Seeks Arms  Safety Experts Say School Bus Passengers Should Be Belted  Teacher Strikes Idle Kids ML in NLP18

19 Challenges – ambiguity  Pronoun reference ambiguity Credit: http://www.printwand.com/blog/8-catastrophic-examples-of-word-choice-mistakeshttp://www.printwand.com/blog/8-catastrophic-examples-of-word-choice-mistakes ML in NLP19

20 Challenges – language is not static ML in NLP20  Language grows and changes  e.g., cyber lingo LOLLaugh out loud G2GGot to go BFNBye for now B4NBye for now IdkI don’t know FWIWFor what it’s worth LUWAMHLove you with all my heart

21 Challenges--language is compositional Carefully Slide ML in NLP21

22 Challenges--language is compositional 小心 : Carefully Careful Take Care Caution 地滑 : Slide Landslip Wet Floor Smooth ML in NLP22

23 Challenges – scale ML in NLP23  Examples:  Bible (King James version): ~700K  Penn Tree bank ~1M from Wall street journal  Newswire collection: 500M+  Wikipedia: 2.9 billion word (English)  Web: several billions of words

24 This lecture ML in NLP24  Course Overview  What is NLP? Why it is important?  What will you learn from this course?  Course Information  What are the challenges?  Key NLP components  Key ML ideas in NLP

25 Part of speech tagging ML in NLP25

26 Syntactic (Constituency) parsing ML in NLP26

27 Syntactic structure => meaning Image credit: Julia Hockenmaier, Intro to NLP ML in NLP27

28 Dependency Parsing ML in NLP28

29 Semantic analysis  Word sense disambiguation  Semantic role labeling Credit: Ivan Titov ML in NLP29

30 Machine learning CS6501- Advanced Machine Learning 30

31 CS6501- Advanced Machine Learning 31

32 Perceptron, decision tree, support vector machine K-NN, Naïve Bayes, logistic regression…. CS6501- Advanced Machine Learning 32

33 Classification is generally well-understood CS6501- Advanced Machine Learning 33  Theoretically: generalization bound  # examples to train a good model  Algorithmically:  Efficient algorithm for large data set  E.g., take a few second to train a linear SVM on data with millions instances and features  Algorithms for non-linear model  E.g., Kernel methods Is this enough to solve all real-world problems?

34 Reading Comprehension CS6501- Advanced Machine Learning 34

35 Challenges  Modeling challenges  How to model a complex decision?  Representation challenges  How to extract features?  Algorithmic challenges  Large amount of data and complex decision structure 58 Bill Clinton, recently elected as the President of the USA, has been invited by the Russian President], [Vladimir Putin, to visit Russia. President Clinton said that he looks forward to strengthening ties between USA and Russia Algorithm 2 is shown to perform better Berg-Kirkpatrick, ACL 2010. It can also be expected to converge faster -- anyway, the E- step changes the auxiliary function by changing the expected counts, so there's no point in finding a local maximum of the auxiliary function in each iteration a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous algorithms. This enables us to develop structured contextual bandits, a partial information structured prediction setting with many potential applications. Can learning to search work even when the reference is poor? We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy. Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal of learning is to Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Structured prediction models Deep learning models Inference / learning algorithms

36 Modeling Challenges  How to model a complex decision?  Why this is important? Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book CS6501- Advanced Machine Learning 36

37 Language is structural CS6501- Advanced Machine Learning 37

38 Hand written recognition  What is this letter? CS6501- Advanced Machine Learning 38

39 Hand written recognition  What is this letter? CS6501- Advanced Machine Learning 39

40 Visual recognition CS6501- Advanced Machine Learning 40

41 Human body recognition CS6501- Advanced Machine Learning 41

42 Bridge the gap  Simple classifiers are not designed for handle complex output  Need to make multiple decisions jointly  Example: POS tagging: Example from Vivek Srikumar can you can a can as a canner can can a can CS6501- Advanced Machine Learning 42

43 Make multiple decisions jointly CS6501- Advanced Machine Learning 43  Example: POS tagging:  Each part needs a label  Assign tag (V., N., A., …) to each word in the sentence  The decisions are mutually dependent  Cannot have verb followed by a verb  Results are evaluated jointly can you can a can as a canner can can a can

44 Structured prediction problems CS6501- Advanced Machine Learning 44  Problems that  have multiple interdependent output variables  and the output assignments are evaluated jointly  Need a joint assignment to all the output variables  We called it joint inference, global infernece or simply inference

45  Input:  Truth: ∈ y∗ ∈ () ∈ y∗ ∈ () Ican a ProMdVbDtNn Pro Md Nn Dt Vb Nn Md Vb  Predicted: ℎ() ∈ ()  Loss:, ∗ Goal: make joint prediction to minimize a joint loss based on find ℎ ∈ such that ℎx∈ () minimizing, ~, ℎ samples, ~ Kai-Wei Chang (University of Virginia) 45 A General learning setting

46  Input:  Truth: ∈ y∗ ∈ () ∈ y∗ ∈ () Ican a ProMdVbDtNn Pro Md Nn Dt Vb Nn Md Vb  Predicted: ℎ () ∈ ()  Loss:, ∗ # POS tags: 45 How many possible outputs for sentence with 10 words? 45 <= = 3.4×10 <D Kai-Wei Chang (University of Virginia) 46 Combinatorial output space Observation: Not all sequences are valid, and we don’t need to consider all of them

47 Representation of interdependent output variables  A compact way to represent output combinations  Abstract away unnecessary complexities  We know how to process them  Graph algorithms for linear chain, tree, etc. PronounVerbNounAndNounPronounVerbNounAndNoun Root Theyoperateshipsandbanks. CS6501- Advanced Machine Learning 47

48 Algorithms/models for structured prediction CS6501- Advanced Machine Learning 48  Many learning algorithms can be generalized to the structured case  Perceptron → Structured perceptron  SVM → Structured SVM  Logistic regression → Conditional random field (a.k.a. log-linear models)  Can be solved by a reduction stack  Structured prediction → multi-class → binary

49 Representation Challenges  How to obtain features? 1.Design features based on domain knowledge  E.g., by patterns in parse trees  By nicknames  Need human experts/knowledge When Chris was three years old, his father wrote a poem about him. Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. CS6501- Advanced Machine Learning 49

50 Representation Challenges  How to obtain features? 1.Design features based on domain knowledge 2.Design feature templates and then let machine find the right ones  E.g., use all words, pairs of words, … Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book CS6501- Advanced Machine Learning 50

51 Tamil Language Tools https://www.tamilvu.org/en/content/tamil-computing-tools https://www.tamilvu.org/coresite/html/cwsoftlist.htm https://oss.neechalkaran.com/tamilsoftwares/ https://tamilpesu.us/en/ https://oss.neechalkaran.com/tamilsoftwares/ http://karthikramas.blogdrives.com/archive/21.html https://tamilcharam.com/ https://www.kaggle.com/datasets/praveengovi/tamil-language- corpus-for-nlp?select=Tamil_News_Corpus

52 NLTK NLTK is a suite of open source Python modules, data sets and tutorials supporting research and development in natural language processing Download NLTK from nltk.org

53 Components of NLTK 1.Code: corpus readers, tokenizers, stemmers, taggers, chunkers, parsers, wordnet,... (50k lines of code) 2.Corpora: >30 annotated data sets widely used in natural language processing (>300Mb data) 3.Documentation: a 400-page book, articles, reviews, API documentation

54 1. Code corpus readers tokenizers stemmers taggers parsers wordnet semantic interpretation clusterers evaluation metrics …

55 2. Corpora Brown Corpus Carnegie Mellon Pronouncing Dictionary CoNLL 2000 Chunking Corpus Project Gutenberg Selections NIST 1999 Information Extraction: Entity Recognition Corpus US Presidential Inaugural Address Corpus Indian Language POS-Tagged Corpus Floresta Portuguese Treebank Prepositional Phrase Attachment Corpus SENSEVAL 2 Corpus Sinica Treebank Corpus Sample Universal Declaration of Human Rights Corpus Stopwords Corpus TIMIT Corpus Sample Treebank Corpus Sample …

56 3. Documentation a 400-page book about natural language processing in Python and NLTK teaches Python and NLP provides numerous examples and exercises installation instructions presentation slides for some of the book chapters API Documentation: describes every module, interface, class, and method

57

58

59

60


Download ppt "TAMIL SOFTWARE DEVELOPMENT Dr.S.Suganthi Assistant Professor Department of Computer Science G.Venkataswamy Naidu College, Kovilpatti."

Similar presentations


Ads by Google