GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.

Slides:



Advertisements
Similar presentations
The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Advertisements

Distributed Processing, Client/Server and Clusters
Virtual Machine Technology Dr. Gregor von Laszewski Dr. Lizhe Wang.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Seminar Grid Computing ‘05 Hui Li Sep 19, Overview Brief Introduction Presentations Projects Remarks.
Company Confidential 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Towards a mobile content delivery network with a P2P architecture Carlos Quiroz.
Distributed Database Management Systems
Object Naming & Content based Object Search 2/3/2003.
Grid Computing, B. Wilkinson, 20046c.1 Globus III - Information Services.
Wide-area cooperative storage with CFS
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
© 2011 IBM Corporation 11 April 2011 IDS Architecture.
Cmpe 494 Peer-to-Peer Computing Anıl Gürsel Didem Unat.
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
A Distributed Architecture for Multi-dimensional Indexing and Data Retrieval in Grid Environments Athanasia Asiki, Katerina Doka, Ioannis Konstantinou,
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Introduction of P2P systems
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Virtualization Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is licensed.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
High Throughput Computing on P2P Networks Carlos Pérez Miguel
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
MediaGrid Processing Framework 2009 February 19 Jason Danielson.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
Introduction to the Adapter Server Rob Mace June, 2008.
Event-Based Hybrid Consistency Framework (EBHCF) for Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Serverless Network File Systems Overview by Joseph Thompson.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Grid-Powered Scientific & Engineering Applications Ho Quoc Thuan INSTITUTE OF HIGH PERFORMANCE COMPUTING.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Virtualization Technology and Microsoft Virtual PC 2007 YOU ARE WELCOME By : Osama Tamimi.
Full and Para Virtualization
Peer to Peer Network Design Discovery and Routing algorithms
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Understanding and Improving Server Performance
An example of peer-to-peer application
Data Management on Opportunistic Grids
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
CHAPTER 3 Architectures for Distributed Systems
1. 2 VIRTUAL MACHINES By: Satya Prasanna Mallick Reg.No
Replication Middleware for Cloud Based Storage Service
OS Virtualization.
Peer to Peer Information Retrieval
Outline Midterm results summary Distributed file systems – continued
Multiple Processor Systems
CLUSTER COMPUTING.
LO2 – Understand Computer Software
GridTorrent Framework: A High-performance Data Transfer and Data Sharing Framework for Scientific Computing.
Virtualization Dr. S. R. Ahmed.
Presentation transcript:

GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou School of ECE Computing Systems Laboratory National Technical University of Athens

Concept Text based search in audiovisual content Search results: Portions of video files containing selected keywords Example User searches for keyword “Acropolis” Video portions containing the spoken word “Acropolis” are located and presented in the user YouTube like functionality

Objectives Keyword extraction from video files using automatic speech recognition algorithms (ASR) Efficient and scalable distributed storage of large media content Indexing of extracted metadata for efficient keyword search YouTube like user interface for video searching/downloading Contribution to existing Grid Middleware using GGF standardized components

Addressed Issues Execution Distributed execution of CPU/Data intensive Speech Recognition Algorithms Storage Server load balancing using performance metrics Client transfer time optimization using bittorrent like algorithms Increase data availability Multi-organizational data storage support using Virtual Organizations (VOs)

Video file keyword extraction procedure

Distributed Execution Platform Architecture

Distributed Storage Platform Architecture

Distributed Replica Location Service Contains mappings of LFNs to PFNs DHT used: Pastry P2P Logarithmic routing In a network with n nodes, a query needs only log(n) messages (hops) Plaxton’s algorithm minimizes query latencies Redundancy through replication Eliminates single point of failure situations Inherent load balancing capabilities Consistent hashing algorithms

Load Balancing Servers exchange load metrics CPU Bandwidth Free Disk Space Prediction algorithms (e.g. Linear regression) forecast future metrics from history data Weighted Normalized Metric WNM : W m X(M t /M max ) Total Server Load (TSL): Sum(WNM i ) i=1..n Servers maintain numerically sorted TSL list: [TSL 1..TSL n ] TSL list periodically refreshed

Replication Upon a STORE client request: Top k servers are selected from WNM list k: configurable static replication factor Most suitable server is returned to the client Client initiates a single GridFTP file upload Server replicates the new file according to WNM list and factor k DRLS is informed about the new LFN->PFN mappings Client is informed Upon completion

Parallel Downloader Upon a GET client request: Server contacts DRLS and retrieves replica locations Client establishes N GridFTP connections Client initiates N parallel (threaded) small data chunk requests After each successful retrieval, client re-initiates another request Optimum file transfer time: The greater file portion is retrieved from the faster storage nodes To be replaced by GridTorrent

GridTorrent Metadata fields Current Id File size Piece size Hashes Distributed RLS instead of Tracker Partial GridFTP for actual transfer BitTorrent replica selection and tit-for-tat algorithm. Compatible with plain GridFTP servers PFN’s prefix determines protocol (gtp://site.fully.qualified.dom ain.name/path/to/file)

EXISTING MIDDLEWARE CONTRIBUTION Design and development of a dynamic replica selection/placement algorithm Added support for multiple clusters using Gridway Metascheduler Replace centralized replica location service with a scalable distributed peer to peer solution Enhanced bittorrent like file downloading from multiple sources

Development Testbed Hardware 4X Dual Core AMD Opteron(tm) Processor GHz – 8 virtual CPU 16Gb Ram Deployment of 5 Xen virtual machines, 2GB ram Software Globus Toolkit v4.0 Globus WS Core (Apache AXIS WS Container) Rice Pastry P2P (Java) Network Weather Service Torque (OpenPBS) scheduler GridWay Metascheduler

Virtualization Xen Hypervisor Paravirtualization tequnique Guest OS use special “xen aware” kernel Direct utilization of special CPU instructions Faster than full virtualization (VmWare) Use of Xen Hypervisor Easy prototype management/administration Simple control of the node lifecycle Facilitate prototype deployment in many actual nodes

Currently working Replace ParallelDownloader with GridTorrent Deploy prototype in the PlanetLab testbed Run experiments Fine-tune designed algorithms

Gridnews Portal Users can: Perform keyword search in the auto-annotated multimedia content View the video from their browser in a youTube style Download only a fragment of the video where this keyword exists

Screenshots keyword Video URLs time position

Questions