Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finite State Automata and Tries Sambhav Jain IIIT Hyderabad.

Similar presentations


Presentation on theme: "Finite State Automata and Tries Sambhav Jain IIIT Hyderabad."— Presentation transcript:

1 Finite State Automata and Tries Sambhav Jain IIIT Hyderabad

2 Think !!! How to store a dictionary in computer? How to search for an entry in that dictionary? – Say you have each word length exactly equal to 10 characters and can take any letter from ‘a-z’ Eg. aaaaaaaaaa, abcdefghij, …. etc Language = [a-z]{10}- RegEx 2Finite State Automata and Tries

3 A Simple Way aaaaaaaaaa aaaaaaaaab aaaaaaaaac …. zzzzzzzzzz A Linear Sorted List of Entries 3Finite State Automata and Tries

4 A Simple Way aaaaaaaaaa aaaaaaaaab aaaaaaaaac …. zzzzzzzzzz Character to be stored = 26 10 = 1.41167096 × 10 14 Each character take 1 Byte ~ 141 TB 4Finite State Automata and Tries

5 Smart Way ! abcdwxyz abcdwxyz abcdwxyz …………………………………………….. ………………………………..………………………………………………………………………………………. 5Finite State Automata and Tries

6 Smart Way ! abcdwxyz abcdwxyz abcdwxyz …………………………………………….. ………………………………..………………………………………………………………………………………. Total Storage = 26x10 = 260 bytes Traverse 10 nodes 6Finite State Automata and Tries

7 Does it work for Natural Language Oxford Advanced English Learner 20 th Edition – A quarter of a million distinct English words, excluding inflections, and words from technical and regional vocabulary not covered by the OED After inflections ? – eat,eats,eaten,eating ….. What after multiple inflexion ??? – beauty, beautiful, beautifully … 7Finite State Automata and Tries

8 Example (Store & Search) e e a s t n ing 8Finite State Automata and Tries

9 Example e e a s t n ing b 9Finite State Automata and Tries

10 Example e e a s t n ing b f s a 10Finite State Automata and Tries

11 Example e e a s t n ing b f s a t e i r w 11Finite State Automata and Tries

12 Inflectional morphology Deals with word forms of a root, when there is no change in lexical category. Each word form gives different values of features like gender, number, person, etc. 12Finite State Automata and Tries

13 Paradigm For a given root, there are many word forms with different features. Ex. Forms of Hindi root laDakA (boy) DirectOblique SingularlaDakAlaDake PlurallaDakelaDakoM 13Finite State Automata and Tries

14 Paradigm - 'laDakoM' is plural with oblique case - given by feature structure {num=pl, case=obl} - 'laDake' stands for two feature structures + Singular oblique (Ex. laDake ne kahA...) - where oblique means 'laDake' is followed by a postposition marker + plural direct case (Ex. laDake Aye) 14Finite State Automata and Tries

15 Paradigm o Paradigms - What operation is done on root to obtain word forms - Model using pairs: (delete string, add string) | direct oblique ---|----------------------- sg | (O,O) (A,e) pl | (A,e) (A,oM) o List roots with paradigms they follow: - ghoDA follows paradigm laDakA - charkhA follows paradigm laDakA - laDakA follows paradigm laDakA 15Finite State Automata and Tries

16 l k | | a a | | D p | | -------- a | | | a A D | | | k ------- | | | | ------------ | I i | | | ------- | A e o | | | A | | | | | | A e o M M | M 16Finite State Automata and Tries

17 Abstracting out suffixes k l | | a a | | p D | | a --------- | | | D #1 a A | | k (#1) I #1: Corresponds to paradigm for 'laDakA' 17Finite State Automata and Tries

18 - Suffix trie (forward) #1 | -------------- | | | e o A | M 18Finite State Automata and Tries

19 Can we further optimize our search ? - Use knowledge of paradigms - Use suffix tree 19Finite State Automata and Tries

20 Store suffix tree in main memory Store rest of the categorized by paradigm in hard disk Do backward search for suffix tree Identify the paradigm Search only in that paradigm set Eg. if ‘–ing’ occur you first won’t be searching word like home, cat, god … 20Finite State Automata and Tries

21 Finite State Automata Trie is a data structure FSA is the computational approach Slight difference in representation – Putting characters on edges rather than nodes 21Finite State Automata and Tries

22 + / \ l / \ k + + a | | a | | + + D | | p | | + + a | | a | | + + k | | D | | + + \ / 0 \ / 0 +______ e/ \o \ A / \ \ (+) + (+) | |M (+) 22Finite State Automata and Tries

23 FSA o A deterministic finite-state machine formally is - Q: A finite set of states (Ex.:{q0,q1,q2}) - SIGMA: A finite set of input alphabet (Ex.: {a,b,c}) - Start state: A state in Q, from which machine starts (Ex.: q0) - F: A set of accepting states (Ex.: {q2}) - DELTA (q,i): A transition function or transition matrix where: - q MEMBER Q, i MEMBER SIGMA, - DELTA(q,i) MEMBER Q Thus, DELTA(q,i): Q x SIGMA --> Q 23Finite State Automata and Tries

24 RECOGNITION Problem Till now we were handling only RECOGNITION problem If FSA reach a final state at the end of input string then EXIST Else NOT 24Finite State Automata and Tries

25 But we seek analyzed output We want the machine to tell – Root – Gender – Number – Person – Case – Etc …… 25Finite State Automata and Tries

26 Finite State Transducer FST is like the finite state automation defined earlier, except each arc is labelled by a pair of symbols: i:o where i: symbol in input string o: symbol output by FST when are is taken + Ex. arc in finite state transducer corresponding to 'e' in 'ladake' e : ((+pl, -direct), (+sg, +dir)) q1 +----------------->--------------------+ q2 Two pairs of symbols: i : o - i is: 'e' - o is: '((+pl, -direct), (+sg, +dir))' + Ex. Morph Analyzer: Match input with i, if successful go ahead & produce o in output 26Finite State Automata and Tries

27 o Formally: Finite state transducer - Q: Finite set of states q0,..., qN - SIGMA_IN: Finite set of input symbols - SIGMA_OUT: Finite set of pairs output symbols - q0: Start state (q0 IN Q) - F: Set of final accepting states (F SUBSET Q) - DELTA (q, i:o) : For every state q, gives a set of states that can be reached from q with i in SIGMA_IN, and o in SIGMA_OUT. 27Finite State Automata and Tries

28 Example on board 28Finite State Automata and Tries

29 Tools for FSA Lex OpenFST – (www.openfst.org/)www.openfst.org/ AT&T FSM Toolkit – (http://www2.research.att.com/~fsmtools/fsm/)http://www2.research.att.com/~fsmtools/fsm/ 29Finite State Automata and Tries


Download ppt "Finite State Automata and Tries Sambhav Jain IIIT Hyderabad."

Similar presentations


Ads by Google