Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finite State Transducers

Similar presentations


Presentation on theme: "Finite State Transducers"— Presentation transcript:

1 Finite State Transducers
Mark Stamp Finite State Transducers

2 Finite State Automata FSA  states and transitions State are circles:
Represented as labeled directed graphs FSA has one label per edge State are circles: Double circles for end states: Beginning state Denoted by arrowhead: Or, sometimes bold circle is used: Finite State Transducers

3 FSA Example Nodes are states Transitions are (labeled) arrows
For example… a c 1 3 z y 2 Finite State Transducers

4 Finite State Transducer
FST  input & output labels on edge That is, 2 labels per edge Can be more labels (e.g., edge weights) Recall, FSA has one label per edge FST represented as directed graph And same symbols used as for FSA FSTs may be useful in malware analysis… Finite State Transducers

5 Finite State Transducer
FST has input and output “tapes” Transducer, i.e., can map input to output Often viewed as “translating” machine But somewhat more general FST is a finite automata with output Usual finite automata only has input Used in natural language processing (NLP) Also used in many other applications Finite State Transducers

6 FST Graphically Edges/transitions are (labeled) arrows
Of the form, i : o, that is, input:ouput Nodes labeled numerically For example… a:b c:d 1 3 z:x y:q 2 Finite State Transducers

7 FST Modes FST usually viewed as translating machine
But FST can operate in several modes Generation Recognition Translation (left-to-right or right-to-left) Examples of modes considered next… Finite State Transducers

8 FST Modes Consider this simple example: Generation mode
a:b Consider this simple example: Generation mode Write equal number of a and b to first and second tape, respectively Recognition mode “Accept” when 1st tape has same number of a as 2nd tape has b Translation mode  next slide 1 Finite State Transducers

9 FST Modes Consider this simple example: Translation mode
a:b Consider this simple example: Translation mode Left-to-right  For every a read from 1st tape, write b to 2nd tape Right-to-left  For every b read from 2nd tape, write a to 1st tape Translation is the mode we usually want to consider 1 Finite State Transducers

10 WFST WFST == Weighted FST Often, probabilities serve as weights…
Include a “weight” on each edge That is, edges of the form i : o / w Often, probabilities serve as weights… a:b/1 c:d/0.6 1 3 z:x/0.4 y:q/1 2 Finite State Transducers

11 FST Example Homework… Finite State Transducers

12 Operations on FSTs Many well-defined operations on FSTs
Union, intersection, composition, etc. These also apply to WFSTs Composition is especially interesting In malware context, might want to… Compose detectors for same family Compose detectors for different families Why might this be useful? Finite State Transducers

13 FST Composition Compose 2 FSTs (or WFSTs)
Suppose 1st WFST has nodes 1,2,…,n Suppose 2nd WFST has nodes 1,2,…,m Possible nodes in composition labeled (i,j), for i = 1,2,…,n and j = 1,2,…,m Generally, not all of these will appear Edge from (i1,j1) to (i2,j2) only when composed labels “match” (next slide…) Finite State Transducers

14 FST Composition Suppose we have following labels
In 1st WFST, edge from i1 to i2 is x:y/p In 2nd WFST, edge from j1 to j2 is w:z/q Consider nodes (i1,j1) and (i2,j2) in composed WFST Edge between nodes provided y == w I.e., output from 1st matches input for 2nd And, resulting edge label is x:z/pq Finite State Transducers

15 WFST Composition Consider composition of WFSTs And… 3 1 2 4 3 1 2 4
a:b/0.5 a:a/0.6 b:b/0.3 a:b/0.1 1 2 4 b:b/0.4 a:b/0.2 3 b:a/0.5 a:b/0.3 b:b/0.1 1 2 4 a:b/0.4 b:a/0.2 Finite State Transducers

16 WFST Composition Example
3 a:b/0.5 a:a/0.6 b:b/0.3 a:b/0.1 1 2 4 b:b/0.4 a:b/0.2 3 b:a/0.5 a:b/0.3 b:b/0.1 1 2 4 a:b/0.4 b:a/0.2 a:a/.04 1,2 4,4 a:b/.01 a:b/.24 1,1 2,2 More details and algorithms can be found here: a:a/.02 b:a/.08 4,2 b:a/.06 a:b/.18 4,3 3,2 a:a/.1 Finite State Transducers

17 WFST Composition In previous example, composition is…
But (4,3) node is useless Must always end in a final state a:a/.04 1,2 4,4 a:b/.01 a:b/.24 1,1 2,2 a:a/.02 b:a/.08 4,2 b:a/.06 a:b/.18 4,3 3,2 a:a/.1 Finite State Transducers

18 FST Approximation of HMM
Why would we want to approximate an HMM by FST? Faster scoring using FST Easier to correct misclassification in FST Possible to compose FSTs Most important, it’s really cool and fun… Down side? FST may be less accurate than the HMM Finite State Transducers

19 FST Approximation of HMM
How to approximate HMM by FST? We consider 2 methods known as n-type approximation s-type approximation These usually focused on “problem 2” That is, uncovering the hidden states This is the usual concern in NLP, such as “part of speech” tagging This “n-type” and “s-type” terminology comes from the paper, Finite state transducers approximating hidden Markov models, by A. Kempe. It does not seem to be in standard use. Finite State Transducers

20 n-type Approximation Let V be distinct observations in HMM
Let λ = (A,B,π) be a trained HMM Recall, A is N x N, B is N x M, π is 1 x N Let (input : output / weight) = (Vi : Sj / p) Where i  {1,2,…,M} and j  {1,2,…,N} And Sj are hidden states (rows of B) And weight is max probability (from λ) Examples later… Finite State Transducers

21 More n-type Approximations
Range of n-type approximations n0-type  only use the B matrix n1-type  see previous slide n2-type  for 2nd order HMM n3-type  for 3rd order HMM, and so on What is 2nd order HMM? Transitions depend on 2 consecutive states In 1st order, only depend on previous state Finite State Transducers

22 s-type Approximation “Sentence type” approximation
Use sequences and/or natural breaks In n-type, max probability over one transition using A and B matrices In s-type, all sequences up to some length Ideally, break at boundaries of some sort In NLP, sentence is such a boundary For malware, not so clear where to break So in malware, maybe just use a fixed length Finite State Transducers

23 HMM to FST Exact representation also possible Given model λ = (A,B,π)
That is, resulting FST is “same” as HMM Given model λ = (A,B,π) Nodes for each (input : output) = (Vi : Sj) Edge from each node to all other nodes… …including loop to same node Edges labeled with target node Weights computed from probabilities in λ Finite State Transducers

24 HMM to FST Note that some probabilities may be 0
Remove edges with 0 probabilities A lot of probabilities may be small So, maybe approximate by removing edges with “small” probabilities? Could be an interesting experiment… A reasonable way to approximate HMM that does not seem to have been studied Finite State Transducers

25 HMM Example Suppose we have 2 coins Observations? Hidden states?
1 coin is fair and 1 unfair Roll a die to decide which coin to flip We see resulting sequence of H and T We do not know which coin was flipped… …and we do not see the roll of the die Observations? Hidden states? Finite State Transducers

26 HMM Example Suppose probabilities are as given
Then what is λ = (A,B,π) ? 0.8 Hidden states: 0.9 fair unfair 0.2 0.1 0.5 0.5 0.7 0.3 Observations: H T H T Finite State Transducers

27 HMM Example HMM is given by λ = (A,B,π), where
This π implies we start in F (fair) state Also, state 1 is F and state 2 is U (unfair) Suppose we observe HHTHT Then probability of, say, FUFFU is πFbF(H)aFUbU(H)aUFbF(T)aFFbF(H)aFUbU(T) = 1.0(0.5)(0.1)(0.7)(0.8)(0.5)(0.9)(0.5)(0.1)(0.3) = Finite State Transducers

28 HMM Example We have And observe HHTHT A = B = π =
state score probability FFFFF FFFFU FFFUF FFFUU FFUFF FFUFU FFUUF FFUUU FUFFF FUFFU FUFUF FUFUU FUUFF FUUFU FUUUF FUUUU We have A = B = π = And observe HHTHT Probabilities in table Finite State Transducers

29 HMM Example So, most likely state sequence is Problem 1, scoring?
score probability FFFFF FFFFU FFFUF FFFUU FFUFF FFUFU FFUUF FFUUU FUFFF FUFFU FUFUF FUFUU FUUFF FUUFU FUUUF FUUUU So, most likely state sequence is FFFFF Solves problem 2 Problem 1, scoring? Next slide Problem 3? Not relevant here Finite State Transducers

30 HMM Example How to score sequence HHTHT ? Sum over all states
probability FFFFF FFFFU FFFUF FFFUU FFUFF FFUFU FFUUF FFUUU FUFFF FUFFU FUFUF FUFUU FUUFF FUUFU FUUUF FUUUU How to score sequence HHTHT ? Sum over all states Sum the “score” column in table: P(HHTHT) = Forward algorithm is way more efficient Finite State Transducers

31 n-type Approximation Consider the 2-coin HMM with
A = B = π = For each observation, only include the most probable hidden state So, only possible FST labels in this case… H:F/w1, H:U/w2, T:F/w3, T:U/w4 Where weights wi are probabilities Finite State Transducers

32 n-type Approximation Consider example
B = π = For each observation, most probable state Weight is probability H:F/0.45 2 H:F/0.5 1 H:F/0.45 T:F/0.45 T:F/0.45 T:F/0.5 3 Finite State Transducers

33 n-type Approximation Suppose instead…
B = π = Most probable state for each observation? Weight is probability H:U/0.42 2 H:U/0.35 T:F/0.20 1 T:F/0.30 H:F/0.30 T:F/0.25 T:F/0.30 3 4 H:F/0.30 Finite State Transducers

34 HMM as FST Consider 2-coin HMM where Then FST nodes correspond to…
A = B = π = Then FST nodes correspond to… Initial state Heads from fair coin, (H:F) Tails from fair coin (T:F) Heads from unfair coin (H:U) Tails from unfair coin (T:U) Finite State Transducers

35 HMM as FST Suppose HMM is specified by Then FST is… A = B = π = H:F
H:U H:U 2 5 H:F H:F T:F T:U 1 H:F T:F H:U T:U H:U H:F T:F T:U 3 4 T:F T:U T:F Finite State Transducers

36 HMM as FST This FST is boring and not very useful
Weights make it a little more interesting Computing the weights is homework… H:F H:U H:U 2 5 H:F H:F T:F T:U 1 H:F T:F H:U T:U H:U H:F T:F T:U 3 4 T:F T:U T:F Finite State Transducers

37 Why Consider FSTs? FST used as “translating machine”
Well-defined operations on FSTs Composition is an interesting example Can convert HMM to FST Either exact or approximation Approximations may be much simplified, but might not be as accurate Advantages of FST over HMM? Finite State Transducers

38 Why Consider FSTs? Scoring/translating faster with FST
Able to compose multiple FSTs Where FSTs may be derived from HMMs One idea… Multiple HMMs trained on malware (same family and/or different families) Convert each HMM to FST Compose resulting FSTs Finite State Transducers

39 Bottom Line Can we get best of both worlds? Other possibilities?
Fast scoring, composition with FSTs Simplify/approximate HMMs via FSTs Tweak FST to improve scoring Efficient training using HMMs Other possibilities? Directly compute an FST without HMM Or FST as first pass (e.g., disassembly?) Finite State Transducers

40 References A. Kempe, Finite state transducers approximating hidden Markov models J. R. Novak, Weighted finite state transducers: Important algorithms K. Striegnitz, Finite state transducers Finite State Transducers


Download ppt "Finite State Transducers"

Similar presentations


Ads by Google