TEMPLATE DESIGN © 2008 www.PosterPresentations.com Non-URL-Based Crawling strategy :  In a RIA one URL corresponds to many states of DOM. Unlike traditional.

Slides:

Advertisements

Similar presentations

Jeremy S. Bradbury, James R. Cordy, Juergen Dingel, Michel Wermelinger

Advertisements

CONCEPTUAL WEB-BASED FRAMEWORK IN AN INTERACTIVE VIRTUAL ENVIRONMENT FOR DISTANCE LEARNING Amal Oraifige, Graham Oakes, Anthony Felton, David Heesom, Kevin.

Pulan Yu School of Informatics Indiana University Bloomington Web service based Varuna.Net.

1 MDV, April 2010 Some Modeling Challenges when Testing Rich Internet Applications for Security Kamara Benjamin, Gregor v. Bochmann Guy-Vincent Jourdan,

Towards Autonomic Adaptive Scaling of General Purpose Virtual Worlds Deploying a large-scale OpenSim grid using OpenStack cloud infrastructure and Chef.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.

“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS

Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank

EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.

Leveraging User Interactions for In-Depth Testing of Web Application Sean McAllister Secure System Lab, Technical University Vienna, Austria Engin Kirda.

WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.

Load Test Planning Especially with HP LoadRunner >>>>>>>>>>>>>>>>>>>>>>

Client/Server Architectures

A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.

CHAPTER FIVE Enterprise Architectures. Enterprise Architecture (Introduction) An enterprise-wide plan for managing and implementing corporate data assets.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Section 11.1 Identify customer requirements Recommend appropriate network topologies Gather data about existing equipment and software Section 11.2 Demonstrate.

TEMPLATE DESIGN © Efficient Crawling of Complex Rich Internet Applications Ali Moosavi, Salman Hooshmand, Gregor v. Bochmann,

Software Security Research Group (SSRG), University of Ottawa in collaboration with IBM Software Security Research Group (SSRG), University of Ottawa In.

Solving Some Modeling Challenges when Testing Rich Internet Applications for Security Software Security Research Group (SSRG), University of Ottawa In.

On P2P Collaboration Infrastructures Manfred Hauswirth, Ivana Podnar, Stefan Decker Infrastructure for Collaborative Enterprise, th IEEE International.

An Introduction to IBM Systems Director

Secure Search Engine Ivan Zhou Xinyi Dong. Introduction  The Secure Search Engine project is a search engine that utilizes special modules to test the.

CH2 System models.

Software Security Research Group (SSRG), University of Ottawa in collaboration with IBM Software Security Research Group (SSRG), University of Ottawa In.

1 Vulnerability Analysis and Patches Management Using Secure Mobile Agents Presented by: Muhammad Awais Shibli.

BFTCloud: A Byzantine Fault Tolerance Framework for Voluntary-Resource Cloud Computing Yilei Zhang, Zibin Zheng, and Michael R. Lyu

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 2 ARCHITECTURES.

Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.

The roots of innovation Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on:

A security framework combining access control and trust management for mobile e-commerce applications Gregor v.Bochmann, Zhen Zhang, Carlisle Adams School.

Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.

Building Rich Web Applications with Ajax Linda Dailey Paulson IEEE – Computer, October 05 (Vol.38, No.10) Presented by Jingming Zhang.

Using SaaS and Cloud computing For “On Demand” E Learning Services Application to Navigation and Fishing Simulator Author Maha KHEMAJA, Nouha AMMARI, Fayssal.

Job scheduling algorithm based on Berger model in cloud environment Advances in Engineering Software (2011) Baomin Xu,Chunyan Zhao,Enzhao Hua,Bin Hu 2013/1/251.

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.

Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.

Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.

DynaRIA: a Tool for Ajax Web Application Comprehension Dipartimento di Informatica e Sistemistica University of Naples “Federico II”, Italy Domenico Amalfitano.

TEMPLATE DESIGN © Non-URL-Based Crawling strategy :  In a RIA one URL corresponds to many states of DOM. Unlike traditional.

Research of P2P Architecture based on Cloud Computing Speaker : 吳靖緯 MA0G0101.

1 Gregor v. Bochmann, University of Ottawa ICTSS 2015 Sharjah and Dubai (UAE), November 2015 Gregor v. Bochmann School of Electrical Engineering and Computer.

Crawling Rich Internet Applications: The State of the Art Software Security Research Group (SSRG) University of Ottawa In collaboration with IBM Suryakant.

TEMPLATE DESIGN © Non-URL-Based Crawling strategy :  In a RIA one URL corresponds to many states of DOM. Unlike traditional.

REST By: Vishwanath Vineet.

Web Technologies Lecture 6 State preservation. Motivation How to keep user data while navigating on a website? – Authenticate only once – Store wish list.

Whole Test Suite Generation. Abstract Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of.

TEMPLATE DESIGN © Crawling is the process of automatically exploring a web application to discover the states of the application.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Architectural.

The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.

UbiCrawler : a scalable fully distributed Web crawler P. Boldi, B. Codenotti, M. Santini, and S. Vigna, SPE Vol.34 No.2 pages , Feb Kyoung.

Performance Driven Database Design for Scalable Web Applications Jozsef Patvarczki, Murali Mani, and Neil Heffernan Scaling up web applications requires.

Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.

TEMPLATE DESIGN © Automatic Classification of Parameters and Cookies Ali Reza Farid Amin 1, Gregor v. Bochmann 1, Guy-Vincent.

Design and Implementation of a High- Performance Distributed Web Crawler Vladislav Shkapenyuk, Torsten Suel 실시간 연구실 문인철

Software Security Research Group (SSRG),

Enterprise Architectures

TECHNOLOGY GUIDE THREE

Improving searches through community clustering of information

UbiCrawler: a scalable fully distributed Web crawler

Edinburgh Napier University

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

CHAPTER 3 Architectures for Distributed Systems

TECHNOLOGY GUIDE THREE

A Network Operating System Edited By Maysoon AlDuwais

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

TECHNOLOGY GUIDE THREE

Presentation transcript:

TEMPLATE DESIGN © Non-URL-Based Crawling strategy :  In a RIA one URL corresponds to many states of DOM. Unlike traditional websites in which every call to server would change the whole DOM and the page URL, RIA relies on small AJAX updates that does not necessarily modify the page URL:  Traditional distributed crawlers rely heavily on URL in order to partition the search space. Underlying assumption for this strategy is a one to one correspondence between the URL and the state of DOM which does not hold in RIA.  Therefore we propose to partition the search space based on events. Crawling Strategy: Reduce the workload by choosing the events to execute using Greedy algorithm. Crawling Efficiency : Discover states as soon as possible, using Probabilistic model. Reference: [Benjamin 2010] K. Benjamin, G. v. Bochmann, G.-V. Jourdan and V. Onut, Some modeling challenges when testing Rich Internet Applications for security, First Intern. Workshop on Modeling and Detection of Vulnerabilities (MDV 2010), Paris, France, April pages. Partitioning strategies : Mostly use server related matrix as primary tool to partition search space:  Page URL  Server IP address  Server geographical location  [ Loo 2004] describes distributed web crawling by hashing the URL 1.[Exposto 2007] includes geographic information about the crawlers and the searched servers into the task distribution algorithm in order to allocate a crawler that is close to the server to be crawled. 2.[Boldi 2003] in the paper on the UbiCrawler, shows how the so-called consistent hashing approach can be used to allocate the tasks to the different crawlers in such a way that there are only minimal changes when some crawler disappears or new crawlers come in. This approach can be used to obtain better fault tolerance. References: [Loo 2004] B.T. Loo, O. Cooper, S.Krishnamurthy, Distributed Web Crawling over DHTs, Technical report, University of California, Berkeley, 2004, [Exposto 2008] J. Exposto, J. Macedo, A. Pina, A. Alves and J. Rufino, Efficient partitioning strategies for distributed web crawling, in Information Networking: Towards Ubiquitous Networking and Services, Springer LNCS 5200 (2008), pp [Boldi 2003] P. Boldi, B. Codenotti, M. Santini, S.Vigna, UbiCrawler: A Scalable Fully Distributed Web Crawler, Software: Practice & Experience, Vol. 34, 2003, p A Strategy for Distributed Crawling of Rich Internet Applications Seyed M. Mirtaheri, Gregor v. Bochmann, Guy-Vincent Jourdan, Iosif Viorel Onut School of Information Technology and Engineering - University of Ottawa Introduction Rich Internet Applications (RIAs) allow better user interaction and responsiveness than traditional web applications. Thanks to new technologies like AJAX (Asynchronous JavaScript and XML), Rich Internet Applications can communicate with the server asynchronously. This allows continuous user interactions. Figure 1: AJAX enabled RIA communication pattern. Proposed Architectures Acknowledgments This work is supported in part by Center for Advanced Studies, IBM Canada. DISCLAIMER The views expressed in this poster are the sole responsibility of the authors and do not necessarily reflect those of the Center for Advanced Studies of IBM. Motivation and Aim Future Work We are currently working on distributed crawling of RIA in a cloud environment. We plan to add fault tolerance to our strategy so that if some of the nodes crash rest of the nodes continue without interruption. Once we have a working implementation of the system we plan to optimize it based on different infrastructure parameters such as cost of communication or the processing power available to different nodes. Experimental Results (BFS) Security of RIA and automating security testing are important, ongoing, and growing concerns. One important aspect of this automation is the crawling of RIAs i.e. reaching all possible states of the application from the initial state. Being able to do so automatically is also valuable for search engines and accessibility assessment. Background Crawling Strategy :  Breath-First Search  Bounded Depth-First Search  Based on page weight  Greedy algorithm  Probabilistic model 1.[Amalfitano 2010] describes techniques and tools for testing RIA based on passively analyzing log-files and reconstituting the user sessions. 2.[Mesbah 2008] describes a method where specific user input is specified. 3.[Duda 2009] describes a method for testing RIA based on Breadth-First-Search (BFS), but optimized for avoiding repeated execution of the same AJAX calls. Their general approach is similar to ours [Benjamin 2011], however, more oriented towards Depth-First-Searching (DFS) in order to minimize the number of times that the crawler has to go back to a previous state already encountered, which normally requires a sequence of interactions with the server Architecture of AJAX Crawl (Appeared in [Duda 2009]) Figure 2: [Loo 2004] System Architecture References: [Amalfitano 2010] D. Amalfitano, A.R. Fasoline, P. Tramontana, Techniques and tools for Rich Internet Application testing, in Proc. WSE, 2010, pp [Mesbah 2008] A. Mesbah and A. van Deursen, A Component- and Push-based Architectural Style for Ajax Applications. 2008, Journal of Systems and Software (JSS) 81(12): [Duda 2009] C. Duda, G. Frey, D. Kossmann, R. Matter, C. Zhou, AJAX Crawl: Making AJAX Applications Searchable, Proc. IEEE Intern. Conf. on Data Engineering, Shanghai, 2009, pp. 78- Crawling of RIA applications is an expensive and time consuming process due to their large number of states. To accelerate this operation we distribute the operation over many nodes in an elastic cloud environment. Background (Cont’d)Algorithm Experimental Results (Greedy) Greedy algorithm is substantially faster than BFS. However loading the seed URL can become a bottleneck in this algorithm. Scaling Greedy algorithm more than 16 nodes requires a more peer-to-peer architecture. IBM AppScan Enterprise: PhantomJS: