Employing Web Search indexing for fast creation of filtered view of large text files Mostafa Agbaria, Ahmad Atamlh Department of Electrical engineering,

Slides:



Advertisements
Similar presentations
CLEARSPACE Digital Document Archiving system INTRODUCTION Digital Document Archiving is the process of capturing paper documents through scanning and.
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Sharpdesk Overview Desktop Composer Search Imaging      
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Software Frame Simulator (SFS) Technion CS Computer Communications Lab (236340) in cooperation with ECI telecom Uri Ferri & Ynon Cohen January 2007.
1 THE OFFICE 2 OFFICE PROJECT COLLABORATION SYSTEM.
1 Presented By Avinash Gutte Under The Guidance of Mrs. Hemangi Kulkarni Department of Computer Engineering Pimpri-Chinchwad College of Engineering, Pune.
ABSTRACT The goal of this project was to create a more realistic and interactive appliance interface for a Usability Science class here at Union. Usability.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Presented By: Shatsman Yuri Leibovitz Amit Supervised By: Oved Itzhak Winter 2009/10 Networked Software Systems Lab, EE Department, Technion – Israel Institute.
Presented By: Shatsman Yuri Leibovitz Amit Supervised By: Oved Itzhak Winter 2009/10 Networked Software Systems Lab, EE Department, Technion – Israel Institute.
XP Exploring the Basics of Microsoft Windows XP1 Exploring the Basics of Windows XP.
. Website and file organization. How websites work.
Performed by: Yair Sommer Rea Yeheskel Instructor: Idan Katz In Cooperation with:Motorola הטכניון - מכון טכנולוגי לישראל הפקולטה להנדסת חשמל Technion -
3D Object Retrieval Client-Server Project Developers : Iddo Simhon Omer Boker Supervisor : George Leifman Electrical Engineering Software Lab - Technion.
CompuNet Grid Computing Milena Natanov Keren Kotlovsky Project Supervisor: Zvika Berkovich Lab Chief Engineer: Dr. Ilana David Spring, /
Performed by: Yair Sommer Rea Yeheskel Instructor: Idan Katz Cooperated with:Motorola הטכניון - מכון טכנולוגי לישראל הפקולטה להנדסת חשמל Technion - Israel.
(NHA) The Laboratory of Computer Communication and Networking Network Host Analyzer.
Performed by: Gadi Marcu & Tomer Alon Instructor: Erez Zilber המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון טכנולוגי.
Software Systems Lab Department of Electrical Engineering Technion - Israel Institute of Technology By: David Nasi & Amitay Svetlit Supervisor: Oved Itzhak.
Extensible Windows Desktop Utilities Performed By: Alexander Badyan Oren Shalgi Supervisor: Oved Itzhak Winter 2004/5.
3D Object Retrieval Client-Server Project
© Copyright 2003, Binomial International Inc. Phoenix Business Continuity and Disaster Recovery Planning Software Recovery Planning Software Tools Recovery.
Introducing SysAid 5.0 New Features. /102 Content New Graphic User Interface Satisfaction Survey vPro Support Various Remote Control Support Backup database.
Presented by INTRUSION DETECTION SYSYTEM. CONTENT Basically this presentation contains, What is TripWire? How does TripWire work? Where is TripWire used?
Application for Internet Radio Directory 19/06/2012 Industrial Project (234313) Kickoff Meeting Supervisors : Oren Somekh, Nadav Golbandi Students : Moran.
Before you begin If a yellow security bar appears at the top of the screen in PowerPoint, click Enable Editing. You need PowerPoint 2010 to view this presentation.
Section 13.1 Add a hit counter to a Web page Identify the limitations of hit counters Describe the information gathered by tracking systems Create a guest.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
HTML Comprehensive Concepts and Techniques Intro Project Introduction to HTML.
In association with Technion and Qualcomm. Project supervisor: Tatyana Finkel. Technion Israel Institute of Technology.
UNIT 14 Lecturer: Ghadah Aldehim 1 Websites. Introduction 2.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Different ways to implement CSS. There are four different ways to use CSS in your web pages: – Inline CSS – Embedded CSS/Internal CSS – Linked CSS/External.
 ABSTRACT  COMPANY PROFILE  PROJECT PROFILE  INTRODUCTION  PROJECT MANAGEMENT  MODEL USED  SCHEDULING  RISK MANAGEMENT  SYSTEM REQUIREMENT SPECIFICATION.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Managing, Organizing and Finding Files, Information, Shared Folders and Offline Folders powered by dj.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
A Web Crawler Design for Data Mining
Drag and Drop Display and Builder. Timofei B. Bolshakov, Andrey D. Petrov FermiLab.
Microsoft Windows Vista Chapter 1 Fundamentals of Using Microsoft Windows Vista.
NAS Last Update Copyright Kenneth N. Chipps Ph.D. 1.
Marcel Casado NCAR/RAP WEATHER WARNING TOOL NCAR.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
*Note: If you would like to view the transcript of the audio, click Notes in the upper right section of the screen. Main Window The Training Interface.
Network Monitor Final presentation Project in systems programming, Winter 2008 Students : Vitaly Zakharenko & Alex Tikh Supervisor : Isask'har (Zigi) Walter.
ENHANCED MONITORING TOOL PROJECT Project Presentation By: David Nasi & Amitay Svetlit Supervisor: Oved Itzhak Software Systems Lab Department of Electrical.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
Copenhagen, 7 June 2006 Toolkit update and maintenance Anton Cupcea Finsiel Romania.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Product Slides Mary Manzano Team Lead, Enterprise Sales Orange & Bronze Software Labs.
The SCOUR Project Search Contents Of Union’s Registry.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
CONTENTS  Definition And History  Basic services of INTERNET  The World Wide Web (W.W.W.)  WWW browsers  INTERNET search engines  Uses of INTERNET.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
FlowLevel Client, server & elements monitoring and controlling system Message Include End Dial Start.
Background Computer System Architectures Computer System Software.
File Tracking System Presented by J. Madhusudhana Reddy Asst Mgr(IS)
Developing GRID Applications GRACE Project
Welcome To: Word Day 1 With Your Instructor: Cara Clifford Class will start at Approximately 8:05 AM.
Networked Software Systems Laboratory
Web Routing Designing an Interface
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
DIGITAL LIBRARY.
LonMaker® Turbo Editions New Features
What is a CMS. CMS is content management system CMS is a software that stores content.
Communications & Computer Networks Resource Notes - Introduction
Prof. Leonardo Mostarda University of Camerino
Presentation transcript:

Employing Web Search indexing for fast creation of filtered view of large text files Mostafa Agbaria, Ahmad Atamlh Department of Electrical engineering, Technion Software System Laboratory, Spring 2010 Supervisor : Oved Itzhak, Lab Engineer : Dr.Ilana David Multi-Threaded V.S Single-Thread Single Thread running Page Fault,Disk access, CPU idle. In ideal world Single Thread Thread 1 running Multi Threading Thread 2 running Thread 3 running Thread 4 running Time Abstract The following figure shows the time for building the database using various number of threads (file size = 100Mb). Multi Threaded Indexing In this project we plan to implement a new type of Index to the VLTFV Application that supports fast creation of filtered view of large text files using a Web Search Indexing technique. The implementation is in Microsoft.NET and C#. Creating a database using inverted indexing for pre-processing the data in the log files, by this providing the user with easy and fast way to search the log file. Project Goals The indexer takes more time to build the database than expected using serial parsing. We built the database using Multi-Threading, meaning that the indexing of the file made in parallel using specific number of threads, each indexing a different part of the file, for faster indexing. Each thread Creates new database for its section in the file Sends the database to Web Technique Searcher. After getting all the sub-databases, we merge them into a Main Database. Summary Using the plug-in that have been developed in this project make the searching and the inspecting in very large text file easier and faster and more reliable, using an Advance Algorithm based on Web indexing Technique with the use of the VLTF, making the process of the switching between lines in such large text file more practical for humans. The conventional approach previously used requires going over the entire text file to perform the search, which is time consuming and not practical. This originate a pre-processing for the text file, it can enable us to perform a search in a faster and more reliable way. The index which is the pre- processed database solve the problem of speed and doesn't require us for going over the entire file and from here the save of time is gotten. Pre-Processing Data Sub Database Main Database Inverted index is an index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents. The purpose of an inverted index is to allow fast full text searches, at a cost of increased processing when a document is added to the database.index data structuredatabase filefull text searches Inverted Indexing User Interface Open File Go to Line SearchSearch Conventional Scroll Bar Scroll Knob Line Numbers Search Results Pane Progress Bar File lines counter Text view area In today’s Internet-scale services it’s not uncommon to have logs that contain huge amounts of data. Inspecting such logs can easily overwhelm a human. Therefore, specialized tools that make it easier to manage all the data are essential. In this project we implement a Plug-in to the existing VLTF application which takes the text file and creates an Index that enables very fast search in the file, using inverted indexing. The VLTF provides the GUI for searching and quickly navigating to the found locations in the text file.VLTF Very Large Text File Viewer As network bandwidth increase, network servers (e.g. Web, Mail etc) create exceedingly large log files. The problem of searching in such files resembles the Web Search problem were it is prohibitively long to search all the data simplistically. This project is continuing for VLTFV project (Very Large Text File Viewer), Application responsiveness is independent of input file size. Background