# Xiang Fu Hofstra University Chung-Chih Li Illinois State University 04/13/20101NFM 2010.

## Presentation on theme: "Xiang Fu Hofstra University Chung-Chih Li Illinois State University 04/13/20101NFM 2010."— Presentation transcript:

Xiang Fu Hofstra University Chung-Chih Li Illinois State University 04/13/20101NFM 2010

Background Hacker Server malicious scripts Cool page! 04/13/2010NFM 20102 Problem? SufficientText Inputs Lack of Sufficient Sanitation of Text Inputs

One Typical Error 1 <?php 2 \$msg = \$_POST[msg]; 3 \$sanitized = pregreplace( 4/\.*?\ / i, 5, 6\$msg ) ; 7 savetodb(\$sanitized ) 8 ?> 04/13/20103NFM 2010 script>alert(a) Attackers Input alert(a) Reluctant Kleene Star

Bigger Picture Objective: Automatic Discovery of Vulnerabilities 04/13/20104NFM 2010 Symbolic Execution Test Replayer Bytecode Attack Pattern String Constraint Solver SUSHI

Our Contribution Atomic Replacement Constraints Consider Two Semantics Greedy Reluctant Modeling Using Finite State Transducer (FST) Compact Representation of FST Security Analysis 04/13/2010NFM 20105

Finite State Transducer Accepts Regular Relation Union, Concat, Composition Intersection, Complement Used for Modeling Rewriting Rules [Kaplan94, Karttunen96] 04/13/2010NFM 20106 ε:1 1 2 3 4 a:2 b:3 A (ab,123) L(A)

Hierarchical FST & Modeling Declarative Semantics 04/13/2010NFM 20107 Id(* - * r *)r : ω ε:εε:ε Id(* - * r *) 1 2 3 4 Identical Relation Any String not Containing patter r Goal: Regular Search Pattern Replacement

Modeling Reluctant Semantics 2 Steps Mark the beginning of pattern Do the replacement 04/13/2010NFM 20108 Goal: Key: Left-Most Matching

04/13/2010NFM 2010 9 a a b b c d a b c a b d Input Word a + b + c x Search Pattern #: ε reluc(r) # : ω ε: ε Id() f1f1 s1s1 s2s2 Begin Marker # a # a b b c d # a b c a b d x d x a b d

The Challenge: Begin Marker 04/13/2010NFM 201010 a a b b c d a b c a b d Input Word ### a + b + c x Search Pattern # Look-ahead Capability? Non-determinism 3 Steps: (1)End marker (2)Generic end marker (3)Begin marker

Preliminary End Marker 04/13/2010NFM 201011 1 c: c 5 234 b: b a: a ε:\$ b : b a: a A1A1 a + b + c x Search Pattern Idea: Start with End Marker for Reverse of Search Pattern Problem: Input tape accepts cb + a + only! Reversed Pattern cb + a +

Generic End Marker 04/13/2010NFM 201012 1 1 2 2,1 3 3,1 4 4,1 5 5,1 c:cb:ba:aε:\$ b:b a:a c:c a:a b:b c:cb:b A2A2 cb + a + Pattern c c b a a Input Word c c b a \$ a \$ Output Word Deterministic! a:a

Finally, the Begin Marker 04/13/2010NFM 201013 a + b + c x Search Pattern 1 1 2 2,1 3 3,1 4 4,1 5 5,1 c:c b:ba:aε:# b:b a:a c:c a:a b:b c:cb:b A3A3 0 ε:εε:ε ε:εε:ε ε:εε:ε

04/13/2010NFM 2010 14 a a b b c d a b c a b d Input Word a + b + c x Search Pattern #: ε reluc(r) # : ω ε: ε Id() f1f1 s1s1 s2s2 Begin Marker # a # a b b c d # a b c a b d x d x a b d

Greedy Semantics 04/13/2010NFM 201015 Goal: greedy Challenge: Look-ahead longest match

04/13/2010NFM 201016 Step 1: Begin Marker Step 2: ND End Marker Step 3: Pairing Markers Step 4: Checking Match Step 5: Check Longest Step 6: Replacement a + x Search Pattern aabab #a#ab#ab #a#a\$b#ab #a\$#a\$b#a\$b #a#a\$b#a\$b #aa\$b#a\$b xbxb #a#ab#a\$b #aaba\$b

Applications Solve String Constraints 04/13/2010NFM 201017 Login Servlet Input: user name After filtering single quote and length restriction

Solving Atomic Constraint 04/13/2010NFM 201018 Goal: A1Id(P) Project to Input Tape Solution

SUSHI Constraint Solver Solves Simple Linear String Constraints (SISE) Relies on dk.brics.automaton for FSA operations Self-made Java package for FST operations Supports 16-bit Unicode Compact Transition Representation 04/13/2010NFM 201019

Efficiency of Solver 04/13/2010NFM 201020 Benchmark Equations 1 2 3 4 Login Servlet 1.4 Seconds on 2Ghz PC Flex SDK XSS Attack Equation Size: 565 74 Seconds Shorter than Security Track #1022748

Related Work Forward String Analysis Christensen & Møller [SAS03] Wasserman & Su [PLDI07, ICSE08] Bjørner & Tillmann [TACAS09] Backward String Analysis Kiezun & Ganesh [ISSTA09] Yu & Bultan [SPIN08, ASE09] Fu [COMPSAC07, TAVWEB08] Natural Language Processing * Kaplan and Kay [CL1994] 04/13/2010NFM 201021 Our Contribution: Precise Modeling of Various Regular Substitution Semantics

Limitations SISE String Constraints All Variables Appear on LHS (Once) No Easy Solution for Equation System Yet No string length Future Directions Encoding string length in automata Finite model on bit-vector 04/13/2010NFM 201022

Questions? 04/13/2010NFM 201023

Download ppt "Xiang Fu Hofstra University Chung-Chih Li Illinois State University 04/13/20101NFM 2010."

Similar presentations