Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm.

Similar presentations


Presentation on theme: "1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm."— Presentation transcript:

1 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm Jensen

2 2 / 28 Motivation How can we help developers writing JavaScript web applications? – by providing tools for findings bugs early in the development cycle In this work we focus on finding bugs in the way JavaScript programs interact with the web browser

3 3 / 28 JavaScript in a browser The Document Object Model interaction JavaScript code DOM manipulation events web browser rendering user

4 4 / 28 Example The el.button property is always absent (it is undefined) The el.button property is always absent (it is undefined) An HTMLImageElement object does not have a button property Unreachable The programmer has confused el and ev

5 5 / 28 TAJS: Type Analysis for JavaScript A tool for static analysis of plain JavaScript – the starting point for our work – flow-sensitive dataflow analysis – interprocedural – whole-program analysis – intended for non-minified, non-obfuscated code [S.H. Jensen, A. Møller and P. Thiemann SAS '09]

6 6 / 28 Bug Finding We look for general errors such as: – dead or unreachable code – invocations of built-in functions with an incorrect number of arguments or wrong argument types – undefined dereference – reading absent properties – etc.

7 7 / 28 Contributions We extend the static analysis of TAJS to reason about JavaScript that execute in a browser: – how to model the browser API? 100s of non-standardized objects and functions – how to model the HTML page? complex prototype hierarchy of the W3C DOM – how to model the event system? many kinds of events dynamic registration of event handlers   

8 8 / 28 Architecture potential errors TAJS DOM model... Named tags Flow graph extension... JavaScript code Event handler code  Browser API 

9 9 / 28 The Browser API The global window object – history, location, navigator, screen – alert(...), print(...), encodeURI(...) – setTimeout(...), setInterval(...) – addEventHandler(...) Non-standard and legacy functionality 

10 10 / 28 The HTML DOM The Document Object Model (W3C) – tree like structure – e.g. one JavaScript object for each HTML tag HTMLInputElement, HTMLFontElement, etc. – arranged in a large prototype hierarchy Huge amount of properties and functions – most properties are string or integer constants 

11 11 / 28 The HTML DOM Important functions – createElement(...) – getElementById(...) – getElementByName(...) – getElementByTagName(...) The analysis tracks elements by: Tag ID Name

12 12 / 28 Prototype Hierarchy The complete model has ~250 objects and ~500 properties

13 13 / 28 Choice of Abstraction Model the DOM objects as: single abstract object for every element kind abstract object for every element in the initial HTML page Our Choice

14 14 / 28 Straightforward Hierarchy? The image tag looks pretty innocent: Image objects can be created in several ways: new Image(); document.createElement("img");

15 15 / 28 Example

16 16 / 28 Image Prototype Hierarchy HTMLImageElement (constructor obj) HTMLImageElement (constructor obj) Object (prototype obj) Object (prototype obj) Blue arrows are internal prototype links Red arrows are external prototype links Image (constructor obj) Image (constructor obj) Image (instance obj) Image (instance obj) HTMLImageElement (prototype obj) HTMLImageElement (prototype obj) HTMLImageElement (instance obj) HTMLImageElement (instance obj) Image (prototype obj) Image (prototype obj) Attached to window new Image(); document.createElement("img");

17 17 / 28 Registration of Event Handlers Directly in the HTML source – Using the Browser API – setTimeout(...), setInterval(...) – addEventListener(...) Writes to "magic properties" – x.onclick =..., Special properties that have side- effects on the DOM when written to 

18 18 / 28 Tracking Event Handlers Separate event handlers based on their kind – page load ( onload ) – keyboard ( onkeypress,...) – mouse ( onclick, onmouseover,...) – timed ( setTimeout, setInterval,...) – etc.

19 19 / 28 Flow graph Extension Event handlers are executed by introducing an event- handler-loop – separates page load event handlers from other event handlers – executes event handlers in two non-deterministic loops

20 20 / 28 Evaluation With these extensions TAJS can reason about JavaScript applications that run in a browser Is the analysis precise enough to be useful?

21 21 / 28 Benchmarks Evaluated on a series of benchmarks: – Chrome Experiments – Internet Explorer 9 Test Drive – 10K Challenge – A List Apart – (excluding benchmarks using eval, jquery or not relevant for JavaScript)

22 22 / 28 Research Questions Q1: Ability to show absence of errors? The analysis is able to show that 85-100% of call sites are safe 80-100% of property reads are safe

23 23 / 28 Research Questions Q2: Ability to locate sources of errors? – We randomly introduce spelling errors – The analysis is able to pinpoint most of them (details in the paper)

24 24 / 28 Research Questions Q3: Precision of computed call graph? The analysis is able to show that 90-100% of call sites are monomorphic

25 25 / 28 Research Questions Q4: Precision of inferred types? – boolean, number, string, object and undefined – the analysis is able to show that the average type size is 1.0-1.3 e.g. if the average type size is 1.0 then every read in the program results in values of a single type

26 26 / 28 Research Questions Q5: Ability to detect dead or unreachable code? – found several unreachable functions – most appear to be unused library code copy & pasted directly into the benchmark programs

27 27 / 28 Future / Current Work Dynamically generated code – eval Library support – jQuery, MooTools, etc.

28 28 / 28 Conclusion Extended previous work to reason precisely about JavaScript programs that execute in a browser-based environment allows us to discover general errors such as: reading absent properties dereferencing null or undefined invoking functions with incorrect arguments etc.

29 29 / 28

30 30 / 28 DOM Modules & Levels Module \ LevelLevel 0Level 1Level 2Level 3 Core Module- ( ) HTML Module- ( ) Event Module-- ( ) CSS Module--( ) Browser API --- Year~1996199820002004 In addition we support the HTMLCanvasElement from HTML5.

31 31 / 28 Soundness Issues? Assignment to computed property names foo[bar] = "baz" foo[bar] = function() {...} If the exact value of bar is unknown: – it could be a write to a "magic property" – or a registration of an event handler


Download ppt "1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm."

Similar presentations


Ads by Google