Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology

Similar presentations


Presentation on theme: "Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology"— Presentation transcript:

1 VNLP: An Open Source Framework for Vietnamese Natural Language Processing
Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen – ePi Technology Thi Dam Nguyen – ePi Technology

2 Major tasks in Natural Language Processing
High level Application Word segmentation Part-of-speech tagging Automatic summarization Machine translation Sentiment analysis Word segmentation, part-of-speech tagging (POS tagging), syntactic parsing, named-entity recognition (NER) and co-reference resolution is fundamental task in Natural Language Processing (NLP). Researchers have to do the task although they are spent a lot of time and cost of researchers to reach a deliverable state. ... ... Fundamental task

3 Named Entity Recognizer (NER)
Fundamental Tasks Word segmentation Part-of-speech tagging Syntactic Parser  Named Entity Recognizer (NER) Coreference resolution

4 Framework for Vietnamese NLP?
Stanford CoreNLP Framework for English Framework for Vietnamese Natural Language Processing

5 JVnTextPro JVnTextPro Tokenizer POS Tagging Enough? Solution?

6 Word segmentation VnTokenizer with accuracy upto 96%-98%.
Some improvement are to speed up vnTokenizer: Reading XML-encode data via SAX Tokenize a document by LL parser Using automaton with default transition.

7 Part-of-speech tagging
JVnTagger 91.3% VnTagger 95% VnQTag %

8 MaltParser Syntactic parsing
Tree adjoining grammar, head driven phrase structure grammar,… No software deliverable. MaltParser Open-source Independent of language Acceptable accuracy 70%

9 Named-Entity Recognition
Using rule-based method. The rule-based NER includes two part: a word searching component called gazetteer in GATE's terminology a pattern matching component called transducer Accuracy 59%

10 Coreference resolution
Approaching by heuristic rules consists of two component: Orthographical matcher (orthormatcher) with 17 rules. Co-referencer performs pronominal co-referencing and integrate everything into co-reference lists

11 Open Source Framework for Vietnamese NLP
Document Reset PR VnTokenizer Syntactic parsing Named-entity recognition Sentence splitter VnTagger MaltParser Vn-Ner Co-reference VNLP 11

12 Application of VNLP Automatic synthesis and classification webpages
Online Reputation Managerment - noti5.vn applications of sentiment analysis all mention about a brand determine positive and negative opinion 12

13 PART 5 – CONCLUSION AND FUTURE WORK

14

15 Thank for your attention!
Q & A


Download ppt "Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology"

Similar presentations


Ads by Google