Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo.

Slides:



Advertisements
Similar presentations
Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges.
Advertisements

Computer Systems Networking. What is a Network A network can be described as a number of computers that are interconnected, allowing the sharing of data.
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
1 Internet Umm Alqura University السنة التحضيرية مهارات الحاسب الالي (1)
Location based Streaming Topics for our 1 st presentation  Thesis Description  Positioning System  Streaming over positioning  Questions Session.
Remote mailbox access gateway Software lab project.
Internet – Part II. What is the World Wide Web? The World Wide Web is a collection of host machines, which deliver documents, graphics and multi-media.
The Internet & The World Wide Web Notes
The Internet & Web Browsers Business Webpage Design Kelly Seale.
Databases & Data Warehouses Chapter 3 Database Processing.
Web Archiving at the Innsbruck Newspaper Archive Innsbrucker Zeitungsarchiv / IZA Presentation by Renate Giacomuzzi, Elisabeth Sporer, Armin Schleicher.
1 Accessing the Global Database The World Wide Web.
Networking I Chapter II The Internet. How does one Connect? Dial-Up Connection – Modem ISDN – Integrated Services Digital Network DSL – Digital Subscriber.
The Internet. What is the Internet?  The Internet is a network of networks.  It gives users access to a wide variety of information from millions of.
Describe the following features of LAN’s, WAN’s, and the internet: transmission media, bandwidth, geographical spread and functions. Description of a.
PIZZA WEB PAGE May 28, FOR TODAY  Review Vocabulary Words (take out your worksheets!)  Pizza Web Page  Research more tags  Turn in your homework!
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
How to address communities examples CESNETs experience CESNET, z. s. p. o. Michal Krsek
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
1 ITGS - introduction A computer may have: a direct connection to a net (cable); or remote access (modem). Connect network to other network through: cables.
Chapter 6 The World Wide Web. Web Pages Each page is an interactive multimedia publication It can include: text, graphics, music and videos Pages are.
Language Resource Center Providing technologically enhanced resources for language acquisition.
Postacademic Interuniversity Course in Information Technology – Module C1p1 Contents Data Communications Applications –File & print serving –Mail –Domain.
Communications, Networks, the Internet and the World Wide Web Chapter 2.
Simple Database.
Document Retention System. MARCH 2006 Confidential 2 General Architecture Scan and Search Search only Scan and Search Search only Scan Search Store Secured.
Microsoft Internet Explorer and the Internet Using Microsoft Explorer 5.
Downloading defined: Downloading is the process of copying a file (such as a game or utility) from one computer to another across the internet. When you.
Objective Understand concepts used to web-based digital media. Course Weight : 5%
UNESCO ICTLIP Module 1. Lesson 61 Introduction to Information and Communication Technologies Lesson 6. What is the Internet?
(C) 2008 Clusterpoint(C) 2008 ClusterPoint Ltd. Empowering You to Manage and Drive Down Database Costs April 17, 2009 Gints Ernestsons, CEO © 2009 Clusterpoint.
WEB SCIENCE. What is the difference between the Internet and the World Wide Web? Internet is the entire network of connected computers and routers used.
Streaming Media A technique for transferring data on the Internet so it can be processed as a steady and continuous stream.
Module 10 Administering and Configuring SharePoint Search.
XP Practical PC, 3e Chapter 8 1 Browsing and Searching the Web.
Integrate Full-Text Retrieval with Digital Archives System Reporter : Chia-Hao Lee Computer System and Communication Lab, Academia Sinica Institute of.
Kuliah Pengantar Teknologi Informasi Coky Fauzi Alfi cokyfauzialfi.wordpress.com Internet (2)
How to produce successful live streaming event CESNET Michal Krsek
Web Technologies Lecture 8 Server side web. Client Side vs. Server Side Web Client-side code executes on the end-user's computer, usually within a web.
2N Net Audio system.
World Wide Web Guide * for Students to the Internet.
Copyright © 2002 Pearson Education, Inc. Slide 3-1 Internet II A consortium of more than 180 universities, government agencies, and private businesses.
Computer Systems Networking. What is a Network A network can be described as a number of computers that are interconnected, allowing the sharing of data.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
Understanding Web-Based Digital Media Production Methods, Software, and Hardware Objective
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
The Internet Technological Background. Topic Objectives At the end of this topic, you should be able to do the following: Able to define the Internet.
Sharing makes life beautiful ARKUDA STB SOLUTION.
Week-6 (Lecture-1) Publishing and Browsing the Web: Publishing: 1. upload the following items on the web Google documents Spreadsheets Presentations drawings.
© ExplorNet’s Centers for Quality Teaching and Learning 1 Describe applications and services. Objective Course Weight 5%
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
The Internet & Web Browsers Business Webpage Design Created by Kelly Seale Adapted by Jill Einerson.
Types of computers Module 1.10 AS identifies and distinguishes between computer types and associated software. AS identifies the main hardware.
G063 - Intranets, the Internet and Extranet. Learning Objectives: At the end of this topic you should be able to: describe the characteristics and purpose.
Glencoe Introduction to Multimedia Chapter 2 Multimedia Online 1 Internet A huge network that connects computers all over the world. Show Definition.
Network customization
Bielefeld Academic Search Engine
IS1500: Introduction to Web Development
Objective % Select and utilize tools to design and develop websites.
E-commerce | WWW World Wide Web - Concepts
E-commerce | WWW World Wide Web - Concepts
Agenda About Screenpeaks The technology Content interfaces.
Sharing makes life beautiful
Objective % Select and utilize tools to design and develop websites.
Microsoft Office Illustrated Introductory, Premium Edition
What is the World Wide Web (www)
SmartMaster 2016 Controller
Welcome to Cyberspace The Internet - World Wide Web
Network customization
The Internet and Electronic mail
Presentation transcript:

Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Electronic Media TV & radio Organized in channels Zero democracy in programming (by channel management) Centralized production (big guys business)

Internet Not only web (audio/video and others) –remember archie.sura.net? IPTV / Live / Video on demand Navigation only via web => not easy to find specific program in A/V

Search options I Voice recognition –Language identification –Accents Video recognition –Text interpretation (bush vs. Bush) –Low video quality

Search options II Indexing of web pages –Yahoo! does (google bomb target) Metadata –“Out of the band Metadata” (as in librarian world) –Metadata in files (added during editing or encoding)

Project description Started in 2003 (oh yes, one year before Truveo) “Google for audio and video on Internet” No support from content owners Modular concept Start with.cz Internet

Technical description I Crawler –Crawls web and collects addresses (URL) –Exports URL of multimedia files –Software written by Jyxo (Linux console app)

Technical description II Distiller –Imports addresses of multimedia files –Distills metadata (and makes XML files) –Makes screenshots (if video in file) –C# software and mplayer (windows apps) –Runs in distributed environment

Technical description III Database –Imports XML metadata files to full text DB –Responses back-end queries for web queries –And others fulltext things (i.e. language)

www. yournamehere. edu crawling Crawls webpages Gets addresses Filter A/V adresses distillation Gets metadata from multimedia files indexing search Holds fulltext database Provides back end for querries

Distillation Proces description –Get URL from DB –Get metadata from file available at URL –Get screenshots at 1,30,50 sec –Save metadata & screenshot

Distillation Use of win32 applications –Native players (WMP, RP, Qt) for metadata –Mplayer for screenshots Takes average one minute –Slow servers/bandwidth –Streaming without fast fw

DistillerGRID <= need 16 years to distill URLs Ideal application for GRID computing –Not need of real time response –Huge amount of computing time needed Two ways to create GRID –Build dedicated system –Use of current capacities

Computing machines PC/Windows based HW independent Secure environment –Security of hosting system –Security of distillation process Well connected Not needed to run 24x7 Easy to manage

Configuration ~100 PCs in student labs Running on demand during weekends Virtual machines (MS VPC 2004) in hosting system (Win XP) Three different HW configurations Peak rate about 5000 URLs per minute SQL as background -> pull distribution of work

Actual status I HW –20 crawlers –2 servers for fulltext DB (<1.400 USD) –Distillation stations (X office PC) –Connected by 1 Gb/s to CESNET2 -> GEANT2

Actual status II Database –EU +.com,.edu –> URLs –> valid –> with screenshots

Live show?

Want to test? URLs – – –For XML interface send me

Questions ? Comments ? Michal Krsek, (academic service, Michal Illich, (business