Download presentation
Presentation is loading. Please wait.
Published byss ss Modified about 1 year ago
1
TAMIL SOFTWARE DEVELOPMENT Dr.S.Suganthi Assistant Professor Department of Computer Science G.Venkataswamy Naidu College, Kovilpatti
2
Agenda FONT INPUT LAYOUT ENCODING NATURAL LANGAUGE PROCESSING MACHINE TRANSLATION CHARACTER RECOGNITION
3
What is NLP Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages. ML in NLP3
4
Go beyond the keyword matching Identify the structure and meaning of words, sentences, texts and conversations Deep understanding of broad language NLP is all around us ML in NLP4
5
Machine translation Facebook translation, image credit: Meedan.org ML in NLP5
6
Statistical machine translation Image credit: Julia Hockenmaier, Intro to NLP ML in NLP6
7
Dialog Systems ML in NLP7
8
Sentiment/Opinion Analysis ML in NLP8
9
Text Classification Other applications? www.wired.com ML in NLP9
10
Question answering credit: ifunny.com 'Watson' computer wins at 'Jeopardy' ML in NLP10
11
Question answering Go beyond search ML in NLP11
12
Natural language instruction https://youtu.be/KkOCeAtKHIc?t=1m28s ML in NLP12
13
Digital personal assistant More on natural language instruction Semantic parsing – understand tasks Entity linking – “my wife” = “Kellie” in the phone book credit: techspot.com ML in NLP13
14
Information Extraction Unstructured text to database entries Yoav Artzi: Natural language processing ML in NLP14
15
Challenges – ambiguity Word sense ambiguity ML in NLP15
16
Challenges – ambiguity Word sense / meaning ambiguity Credit: http://stuffsirisaid.comhttp://stuffsirisaid.com ML in NLP16
17
Challenges – ambiguity PP attachment ambiguity Credit: Mark Liberman, http://languagelog.ldc.upenn.edu/nll/?p=17711http://languagelog.ldc.upenn.edu/nll/?p=17711 ML in NLP17
18
Challenges -- ambiguity Ambiguous headlines: Include your children when baking cookies Local High School Dropouts Cut in Half Hospitals are Sued by 7 Foot Doctors Iraqi Head Seeks Arms Safety Experts Say School Bus Passengers Should Be Belted Teacher Strikes Idle Kids ML in NLP18
19
Challenges – ambiguity Pronoun reference ambiguity Credit: http://www.printwand.com/blog/8-catastrophic-examples-of-word-choice-mistakeshttp://www.printwand.com/blog/8-catastrophic-examples-of-word-choice-mistakes ML in NLP19
20
Challenges – language is not static ML in NLP20 Language grows and changes e.g., cyber lingo LOLLaugh out loud G2GGot to go BFNBye for now B4NBye for now IdkI don’t know FWIWFor what it’s worth LUWAMHLove you with all my heart
21
Challenges--language is compositional Carefully Slide ML in NLP21
22
Challenges--language is compositional 小心 : Carefully Careful Take Care Caution 地滑 : Slide Landslip Wet Floor Smooth ML in NLP22
23
Challenges – scale ML in NLP23 Examples: Bible (King James version): ~700K Penn Tree bank ~1M from Wall street journal Newswire collection: 500M+ Wikipedia: 2.9 billion word (English) Web: several billions of words
24
This lecture ML in NLP24 Course Overview What is NLP? Why it is important? What will you learn from this course? Course Information What are the challenges? Key NLP components Key ML ideas in NLP
25
Part of speech tagging ML in NLP25
26
Syntactic (Constituency) parsing ML in NLP26
27
Syntactic structure => meaning Image credit: Julia Hockenmaier, Intro to NLP ML in NLP27
28
Dependency Parsing ML in NLP28
29
Semantic analysis Word sense disambiguation Semantic role labeling Credit: Ivan Titov ML in NLP29
30
Machine learning CS6501- Advanced Machine Learning 30
31
CS6501- Advanced Machine Learning 31
32
Perceptron, decision tree, support vector machine K-NN, Naïve Bayes, logistic regression…. CS6501- Advanced Machine Learning 32
33
Classification is generally well-understood CS6501- Advanced Machine Learning 33 Theoretically: generalization bound # examples to train a good model Algorithmically: Efficient algorithm for large data set E.g., take a few second to train a linear SVM on data with millions instances and features Algorithms for non-linear model E.g., Kernel methods Is this enough to solve all real-world problems?
34
Reading Comprehension CS6501- Advanced Machine Learning 34
35
Challenges Modeling challenges How to model a complex decision? Representation challenges How to extract features? Algorithmic challenges Large amount of data and complex decision structure 58 Bill Clinton, recently elected as the President of the USA, has been invited by the Russian President], [Vladimir Putin, to visit Russia. President Clinton said that he looks forward to strengthening ties between USA and Russia Algorithm 2 is shown to perform better Berg-Kirkpatrick, ACL 2010. It can also be expected to converge faster -- anyway, the E- step changes the auxiliary function by changing the expected counts, so there's no point in finding a local maximum of the auxiliary function in each iteration a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous algorithms. This enables us to develop structured contextual bandits, a partial information structured prediction setting with many potential applications. Can learning to search work even when the reference is poor? We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy. Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal of learning is to Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Structured prediction models Deep learning models Inference / learning algorithms
36
Modeling Challenges How to model a complex decision? Why this is important? Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book CS6501- Advanced Machine Learning 36
37
Language is structural CS6501- Advanced Machine Learning 37
38
Hand written recognition What is this letter? CS6501- Advanced Machine Learning 38
39
Hand written recognition What is this letter? CS6501- Advanced Machine Learning 39
40
Visual recognition CS6501- Advanced Machine Learning 40
41
Human body recognition CS6501- Advanced Machine Learning 41
42
Bridge the gap Simple classifiers are not designed for handle complex output Need to make multiple decisions jointly Example: POS tagging: Example from Vivek Srikumar can you can a can as a canner can can a can CS6501- Advanced Machine Learning 42
43
Make multiple decisions jointly CS6501- Advanced Machine Learning 43 Example: POS tagging: Each part needs a label Assign tag (V., N., A., …) to each word in the sentence The decisions are mutually dependent Cannot have verb followed by a verb Results are evaluated jointly can you can a can as a canner can can a can
44
Structured prediction problems CS6501- Advanced Machine Learning 44 Problems that have multiple interdependent output variables and the output assignments are evaluated jointly Need a joint assignment to all the output variables We called it joint inference, global infernece or simply inference
45
Input: Truth: ∈ y∗ ∈ () ∈ y∗ ∈ () Ican a ProMdVbDtNn Pro Md Nn Dt Vb Nn Md Vb Predicted: ℎ() ∈ () Loss:, ∗ Goal: make joint prediction to minimize a joint loss based on find ℎ ∈ such that ℎx∈ () minimizing, ~, ℎ samples, ~ Kai-Wei Chang (University of Virginia) 45 A General learning setting
46
Input: Truth: ∈ y∗ ∈ () ∈ y∗ ∈ () Ican a ProMdVbDtNn Pro Md Nn Dt Vb Nn Md Vb Predicted: ℎ () ∈ () Loss:, ∗ # POS tags: 45 How many possible outputs for sentence with 10 words? 45 <= = 3.4×10 <D Kai-Wei Chang (University of Virginia) 46 Combinatorial output space Observation: Not all sequences are valid, and we don’t need to consider all of them
47
Representation of interdependent output variables A compact way to represent output combinations Abstract away unnecessary complexities We know how to process them Graph algorithms for linear chain, tree, etc. PronounVerbNounAndNounPronounVerbNounAndNoun Root Theyoperateshipsandbanks. CS6501- Advanced Machine Learning 47
48
Algorithms/models for structured prediction CS6501- Advanced Machine Learning 48 Many learning algorithms can be generalized to the structured case Perceptron → Structured perceptron SVM → Structured SVM Logistic regression → Conditional random field (a.k.a. log-linear models) Can be solved by a reduction stack Structured prediction → multi-class → binary
49
Representation Challenges How to obtain features? 1.Design features based on domain knowledge E.g., by patterns in parse trees By nicknames Need human experts/knowledge When Chris was three years old, his father wrote a poem about him. Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. CS6501- Advanced Machine Learning 49
50
Representation Challenges How to obtain features? 1.Design features based on domain knowledge 2.Design feature templates and then let machine find the right ones E.g., use all words, pairs of words, … Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book CS6501- Advanced Machine Learning 50
51
Tamil Language Tools https://www.tamilvu.org/en/content/tamil-computing-tools https://www.tamilvu.org/coresite/html/cwsoftlist.htm https://oss.neechalkaran.com/tamilsoftwares/ https://tamilpesu.us/en/ https://oss.neechalkaran.com/tamilsoftwares/ http://karthikramas.blogdrives.com/archive/21.html https://tamilcharam.com/ https://www.kaggle.com/datasets/praveengovi/tamil-language- corpus-for-nlp?select=Tamil_News_Corpus
52
NLTK NLTK is a suite of open source Python modules, data sets and tutorials supporting research and development in natural language processing Download NLTK from nltk.org
53
Components of NLTK 1.Code: corpus readers, tokenizers, stemmers, taggers, chunkers, parsers, wordnet,... (50k lines of code) 2.Corpora: >30 annotated data sets widely used in natural language processing (>300Mb data) 3.Documentation: a 400-page book, articles, reviews, API documentation
54
1. Code corpus readers tokenizers stemmers taggers parsers wordnet semantic interpretation clusterers evaluation metrics …
55
2. Corpora Brown Corpus Carnegie Mellon Pronouncing Dictionary CoNLL 2000 Chunking Corpus Project Gutenberg Selections NIST 1999 Information Extraction: Entity Recognition Corpus US Presidential Inaugural Address Corpus Indian Language POS-Tagged Corpus Floresta Portuguese Treebank Prepositional Phrase Attachment Corpus SENSEVAL 2 Corpus Sinica Treebank Corpus Sample Universal Declaration of Human Rights Corpus Stopwords Corpus TIMIT Corpus Sample Treebank Corpus Sample …
56
3. Documentation a 400-page book about natural language processing in Python and NLTK teaches Python and NLP provides numerous examples and exercises installation instructions presentation slides for some of the book chapters API Documentation: describes every module, interface, class, and method
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.