iCrawl – Master Thesis and Hiwi Jobs

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

OTN Cascading Navigation Audience Specific Resources Most Popular Downloads. Articles Clear Groupings Content Summary.
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Why Are Computers Necessary in Today’s World?
The CERIF-2000 Implementation. Andrei S. Lopatenko CERIF Implementation Guidelines Andrei Lopatenko Vienna University of Technology
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
Easing Semantic Data Publishing and Processing Using Semantic MediaWiki and RDFa Jin Guang Zheng.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Testbeds Salim Roukos IBM T. J. Watson Research Center 9/11/02.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
11 3 / 12 CHAPTER Databases MIS105 Lec14 Irfan Ahmed Ilyas.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Universe Design Concepts Business Intelligence Copyright © SUPINFO. All rights reserved.
Overview of Search Engines
Evaluations and recommendations for a user support toolkit Christine Cahoon George Munroe.
Databases & Data Warehouses Chapter 3 Database Processing.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
The Role of DBMS in Computing
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Culture & Sport Science & Technology: iMus – Israeli Museums System Public web portal
ALCME: OAI at OCLC Jeffrey A. Young OCLC Online Computer Library Center, Inc.
Practical Project of the 2006 Joint International Master’s Degree.
Master Thesis Defense Jan Fiedler 04/17/98
Ihr Logo Chapter 7 Web Content Mining DSCI 4520/5240 Dr. Nick Evangelopoulos Xxxxxxxx.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
Aquenergy Portal Elisabetta Zuanelli, University of Rome “Tor Vergata”, Italy E-Age 2014 Muscat december.
©2006, Ventana Research, Inc. Business Intelligence Keynote Panel Location Intelligence 2006 Conference.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
C# AND ASP.NET What will I do in this course?. MAJOR TOPICS Learn to program in the C# language with the Visual Studio IDE (Interactive Development Environment)
Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands.
Publication Spider Wang Xuan 07/14/2006. What is publication spider Gathering publication pages Using focused crawling With the help of Search Engine.
Oracle Data Integrator Architecture Components.
DLESE Metadata Story Katy Ginger Metadata Architect DLESE Program Center Session: Metadata stories – The Creation and Management of Metadata.
Database Concepts Track 3: Managing Information using Database.
GeoProMT Purpose of today’s meeting – Present some research ideas Identify people willing to make a commitment to the project – Development could be part.
Design and Implementation of a Rationale-Based Analysis Tool (RAT) Diploma thesis from Timo Wolf Design and Realization of a Tool for Linking Source Code.
Augmenting Focused Crawling using Search Engine Queries Wang Xuan 10th Nov 2006.
ACIS Introduction to Data Analytics & Business Intelligence Database s Benefits & Components.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Integrated Modeling Environment System Engineering Seminar Johnny Medina / Code 531 Chris Stone / Code 531 / Constellation Software Engineering.
Information Retrieval in Practice
Witold Staniszkis Empowering the Knowledge Worker End-User Software Engineering in Knowledge Management Witold Staniszkis
Information Systems Design and Development
AFFORDABLE WEBSITE DESIGN SERVICES.  The different areas web designing services includes web graphic design, user interface designing, authoring and.
Aim: How can we best search the internet using various search engines?
FirstGov Search July 28th, 2004 NOAA Webshop
Joseph JaJa, Mike Smorul, and Sangchul Song
A new way to explore the possibilities
Web Applications Security What are web Applications?
Fast App Creation with APEX Blueprints
Extraction, aggregation and classification at Web Scale
LTER Metadata Query Interface – Current Status and Future Challenges
Web scraping tools, an introduction
New Teacher Orientation OnCourse Training
Object Oriented Analysis and Design
Fluency with Information Technology
iCrawl – Hiwis Jobs and Master Thesis
Project Structure Overview
User Interface Design and Evaluation
Web Application Server 2001/3/27 Kang, Seungwoo. Web Application Server A class of middleware Speeding application development Strategic platform for.
The Database Environment
Web archives as a research subject
Presentation transcript:

iCrawl – Master Thesis and Hiwi Jobs Context iCrawl Project – A novel approach for the creation of high quality Web Archives Easy to use and extensible Web archive crawler framework Usable also by non-technicians User Interface Key Component to interact with the crawler Setting up crawls Maintaining and monitoring crawls Quality assurance of crawls Thomas Risse 08/11/18

Master Thesis: Crawl Specification Wizard Problem Statement Quality of a Web Archive depends on the quality of the Crawl specification Crawl specification for focused crawls are complex and hard to define (Initial Starting points, good descriptions of terms, entities, etc.) Crawl specification are similar to search engine queries but more complex Aim of the Master Thesis Development of an semi-automatic tool that learns the intention of a crawl Based on a set of reference pages or on search engine results Iterative and interactive process Requires analysis and extraction of information from Web pages Requirements Interest in doing cool things in the context of a research project A “feeling” for good design and user friendliness Programming skills in Java Contact: Thomas Risse (L3S), risse@L3S.de Thomas Risse 08/11/18

Master Thesis: Entity-centric Linked Data Crawler Topic Development of an entity-centric Linked Data crawler Automatic collection of metadata for Linked Data sources to enable crawler prioritization Integration of the crawler with the iCrawl platform for integrated crawling of Web pages and Linked Data Requirements Good grades in the IR-related courses Good programming skills in Java Interest in research-oriented projects Contact: Elena Demidova, demidova@L3S.de Thomas Risse 08/11/18

Hiwi Job in the context of Web Archiving Topic User Interface development for setup, maintaining and monitoring of crawls Easy to use (also for non-computer scientists) Near-real-time information Requirements Interest in doing cool things in the context of a research project A “feeling” for good design and user friendliness Programming skills in Java Contact: Thomas Risse (L3S), risse@L3S.de Thomas Risse 08/11/18