Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Language Andrew Hippisley. Computational Language Computational language and AI Language engineering: applied computational language Case.

Similar presentations


Presentation on theme: "Computational Language Andrew Hippisley. Computational Language Computational language and AI Language engineering: applied computational language Case."— Presentation transcript:

1 Computational Language Andrew Hippisley

2 Computational Language Computational language and AI Language engineering: applied computational language Case study: spell checkers

3 Computational language & AI Artificial Intelligence: “the simulation on computer of distinctly human mental functions.” Wilks (1993)

4 Computational language & AI Language integral to intelligent systems Artificial Intelligence Turing Test ELIZA

5

6 Computational language & AI Why language engineering? Language integral to intelligent systems Artifiicial Intelligence Turing Test ELIZA Expert systems: natural language interface, natural language database

7 Computational language & AI Methods shared across systems Finite State Transition Networks (FSTN) Logic Formal rules Probability Data: you know it!

8 Applied computational language History of the field Machine Translation: 1960, 1966, post 1966 Database access Text interpretation Information retrieval Text categorisation

9 Language engineering Information overload Need a way of automatically processing text documents Information extraction

10 Language engineering Information extraction GIDA: system for automatically monitoring financial market sentiment

11 GIDA

12 Language engineering Information overload Need a way of automatically processing text documents Information extraction Summarisation

13 Automatic summarisation (courtesy of Paulo FERNANDES de OLIVEIRA, PhD) Get information source; Extract some content from it; most importantPresent the most important part to the user xx xxx xxxx x xx xxxx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xx xx xxxx xxx xxx xx xx xxxx x xxx xx x xx xx xxxxx x x xx xxx xxxxxx xxxxxx x x xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx x xx xx xxxx xxx xxxx xx xxx xx xxx xxxx xx xxx x xxxx x xx xxxx xx xx xxxxx x x xx xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx xxx xx xxxx x xxxxx xx xxxxx x

14 Lexical Cohesion Sentence 23: J&J's stock added 83 cents to $65.49. Sentence 26: Flagging stock markets kept merger activity and new stock offerings on the wane, the firm said. Sentence 42: Lucent, the most active stock on the New York Stock Exchange, skidded 47 cents to $4.31, after falling to a low at $4.30. Sentence 15: "For the stock market this move was so deeply discounted that I don't think it will have a major impact". Links Example Text title: U.S. stocks hold some gains. Collected from Reuters’ Website on 20 March 2002.

15 Lexical Cohesion 17. In other news, Hewlett-Packard said preliminary estimates showed shareholders had approved its purchase of Compaq Computer -- a result unconfirmed by voting officials. 19. In a related vote, Compaq shareholders are expected on Wednesday to back the deal, catapulting HP into contention against International Business Machines for the title of No. 1 computer company. Bonds Example Text title: U.S. stocks hold some gains. Collected from Reuters’ Website on 20 March 2002.

16 Language engineering Information overload Need a way of automatically processing text documents Information extraction Summarisation Translation Retrieve only relevant documents Voice processing

17 Language engineering Two main approaches Symbolic Stochastic

18 Case study spell checkers

19 Spelling dictionaries aim? given a sequence of symbols: 1. identify misspelled strings 2. generate a list of possible ‘candidate’ correct strings 3. select most probable candidate from the list

20 Spelling dictionaries Implementation: Probabilistic framework bayesian rule noisy channel model

21 Spelling dictionaries Types of spelling error actual word errors non-word errors

22 Spelling dictionaries Types of spelling error actual word errors /piece/ instead of /peace/ /there/ instead of /their/ non-word errors

23 Spelling dictionaries Types of spelling error actual word errors /piece/ instead of /peace/ /there/ instead of /their/ non-word errors /graffe/ instead of /giraffe/

24 Spelling dictionaries Types of spelling error actual word errors /piece/ instead of /peace/ /there/ instead of /their/ non-word errors /graffe/ instead of /giraffe/ of all errors in type written texts, 80% are non- word errors

25 Spelling dictionaries non-word errors Cognitive errors /seperate/ instead of /separate/ phonetically equivalent sequence of symbols has been substituted due to lack of knowledge about spelling conventions

26 Spelling dictionaries non-word errors Cognitive errors Typographic (‘typo’) errors influenced by keyboard e.g. substitution of /w/ for /e/ due to its adjacency on the keyboard /thw/ instead of /the/

27 Spelling dictionaries non-word errors noisy channel model The actual word has been passed through a noisy communication channel This has distorted the word, thereby changing it in some way The misspelled word is the distorted version of the actual word Aim: recover the actual word by hypothesising about the possible ways in which it could have been distorted

28 Spelling dictionaries non-word errors noisy channel model What are the possible distortions? insertion deletion substitution transposition all of these viewed as transformations that take place in the noisy channel

29 Spelling dictionaries Implementing spelling identification and correction algorithm

30 Spelling dictionaries Implementing spelling identification and correction algorithm STAGE 1: compare each string in document with a list of legal strings; if no corresponding string in list mark as misspelled STAGE 2: generate list of candidates Apply any single transformation to the typo string Filter the list by checking against a dictionary STAGE 3: assign probability values to each candidate in the list STAGE 4: select best candidate

31 Spelling dictionaries STAGE 3 prior probability given all the words in English, is this candidate more likely to be what the typist meant than that candidate? P(c) = c/N where N is the number of words in a corpus likelihood Given, the possible errors, or transformation, how likely is it that error y has operated on candidate x to produce the typo? P(t/c), calculated using a corpus of errors, or transformations Bayesian rule: get the product of the prior probability and the likelihood P(c) X P(t/c)

32 Spelling dictionaries non-word errors Implementing spelling identification and correction algorithm STAGE 1: identify misspelled words STAGE 2: generate list of candidates STAGE 3a: rank candidates for probability STAGE 3b: select best candidate Implement: noisy channel model Bayesian Rule

33 Next week Finite state machines and regular expressions


Download ppt "Computational Language Andrew Hippisley. Computational Language Computational language and AI Language engineering: applied computational language Case."

Similar presentations


Ads by Google