Presentation is loading. Please wait.

Presentation is loading. Please wait.

Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.

Similar presentations


Presentation on theme: "Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational."— Presentation transcript:

1 Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational information contained in the structure Most structural methods use hierarchical decomposition Note similarity between a sentence structure and pattern description A B C c a b f g d e Picture A Triangle B Rectangle C edge edge edge a b c edge edge d e f g

2 Language Alphabet is a finite set of symbols, V={x 1,x 2, …,x n } Sentence over B is a finite string of ordered symbols (left to right) from V Example: V = {a,b,c}, valid sentences are “abb”, “abba”, “aaa”, null Length of a sentence s, |s| is the number of symbols s 1 os 2 is the concatenation of the two sentences VoVoV…oV = V n is the set of all sentences with n symbols over V V + =VUV 2 UV 3 …. is the set of all non-empty sentences over V V* is the closure of V Language is an arbitrary subset L of V* Example: V={0,1}, then L 1 = {001, 110, 111, 0, null} is a finite language L 2 = {s|s = 1 n 0 2 1 m, n>=1, 1<=m<=10} is an infinite language

3 L 1 oL 2 = {s|s = s 1 s 2, s 1 belongs to L 1 and s 2 belongs to L 2 } is concatenation L1 it = {s|s = s 1 s 2 …s n, n>=0, s i belongs to L 1 } is the iterate of L 1 L 1 oL 2 and L1 it are both languages Example: V = {a,b}L 1 = {aa,ab,bb}L 2 = {a,b} L 1 oL 2 = {a 3,aba,b 2 a,a 2 b,ab 2,b 3 } L 1 it is infinite; for n={0,1,2} s is called a sub-string of t if t =usv for some strings u,v belonging to V* Every string is a substring of itself as u and/or v can be null Languages

4 Grammars Grammar G = {V T, V N, P, S} has 4 entities VT is a set of terminal symbols, called primitives or constants VN is a set of non-terminal symbols, called variables V T and V N belong to V; P is the set of production rules A->B where A has at least one variable and B is a mix of variables and constants S is the starting symbol or the root; S belongs to V N L(G) is a formal language ( a set of strings) generated by the grammar G Each string is composed of only primitives Each string can be derived from S using the production rules P Example: VT = {a,b}, VN = {S}; P = {S->aSb, S->ab} => L(G) : a n b n, n>=1 Grammar is used to : (I) generate the strings (sentences) accepted by L(G), (ii) check if a sentence belongs to a grammar, (iii) analyze the structure of a sentences

5 Grammar Types UnRestricted Grammar (UR) Context Sensitive Grammar (CS) Context Free Grammar (CF) Finite State Grammar (FS) Example: V T = {a,b,c}; V N = {S, A, B} URCSCFFS

6 Finite State Grammars, and Graphical Representations Nodes are nonterminals in V N and an additional terminal node T not in V Productions of type A i ->aA j represented by edge a directed from A i to A j Productions of type A i ->a represented by edge a directed from A i to T S T BA a a a a a For a FS grammar G, an arbitrary string x=x 1 x 2..x n, x i in V T is in L(G) iff there exists at least one path (x 1,x 2,..,x n ) from S to T

7 Syntactic Pattern Recognition C2-class problem C 1 and C 2 are composed of features from a set VT Let G be a grammar such that L(G) consists only of sentences (patterns) from C 1 Example:VT = {a,b}VN = {S,A}P:{S->aSb S->b} L(G): {b; a n b n+1, n>=1} Classification Rule x belongs to C 1 iff x belongs to L(G) x belongs to C 2 iff otherwise Classification algorithm has to correctly answer whether or not a given string is grammatically correct.

8 Pattern Grammars 2-class problem: rectangles and other quadilaterals Select primitives:a:0 o edge b:90 o edge c:180 o edge d:270 o edge Set of rectangles: If a0, b0, c0, d0 represent unit length lines

9 Consider, a:0 o horizontal unit length b:120 o unit length c:240 o unit length L(G) represents the class of equilateral triangles What is the grammar? Make it up from domain knowledge There is no unique solution

10 FS Grammar solution V T = {a,b,c}V N = {S, A, B, C, D, E, F, G, H, I, J, K} CS Grammar solution V T = {a,b,c}V N = {S, A, B, C, D, E, F}

11 Syntax Analysis Let x be the unknown pattern. Recognition task is finding L(G i ) such that x belongs to L(G i ) i.e. Given a string x and a grammar G, construct a triangle with the top vertex S and the bottom side x inside which will be the derivation parse tree Top-down and Bottom-up parsing methods can be used S x

12 Stochastic Languages Probabilities are associated with production rules- stochastic grammar Stochastic language is one obtained by such a grammar Probability of obtaining x is

13 Tree representations A string s 1 is directly derived from string s 2 in G ( ) if there exists a rule in G such that s1 is the result of replacing by. In general, s is derived from the initial symbol of G, S, if there exists a sequence of strings from which we can derive s from S, i.e., Parsing is the reverse of generation

14

15

16

17

18


Download ppt "Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational."

Similar presentations


Ads by Google