Search Computing Stefano Ceri. Prof. Stefano Ceri Database Management Search Computing, an EU-funded Project  European Reseach Council (ERC) runs EU.

Slides:



Advertisements
Similar presentations
Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
Chapter 5: Introduction to Information Retrieval
Project Proposal.
GLOCAL Event-based Retrieval of Networked Media NEM Concertation Meeting Brussels, Feb
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Search Engines and Information Retrieval
TRAINS – Railway Vehicle and TRACK System Integration the MANCHESTER METROPOLITAN UNIVERSITY TRAINS – Railway Vehicle and TRACK System Integration the.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Provisional draft 1 ICT Work Programme Challenge 2 Cognition, Interaction, Robotics NCP meeting 19 October 2006, Brussels Colette Maloney, PhD.
Academic Advisor: Prof. Ronen Brafman Team Members: Ran Isenberg Mirit Markovich Noa Aharon Alon Furman.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
IT Planning.
Web Mining Research: A Survey
CS 290C: Formal Models for Web Software Lecture 6: Model Driven Development for Web Software with WebML Instructor: Tevfik Bultan.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
1212 Management and Communication of Distributed Conceptual Design Knowledge in the Building and Construction Industry Dr.ir. Jos van Leeuwen Eindhoven.
Software Architecture in Practice (3rd Ed) Introduction
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Search Engines and Information Retrieval Chapter 1.
Conceptual Modeling Issues in Web Applications enhanced with Web services Sara Comai, Politecnico di Milano In collaboration with:
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Computational Thinking Across Curriculum Two papers on teaching computational thinking to non-CS students Pejman Khadivi CS Department, Virginia Tech.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Jason Houle Vice President, Travel Operations Lixto Travel Price Intelligence 2.0.
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
The roots of innovation Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on:
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
Information Systems Engineering. Lecture Outline Information Systems Architecture Information System Architecture components Information Engineering Phases.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Search Engine Architecture
LRI Université Paris-Sud ORSAY Nicolas Spyratos Philippe Rigaux.
Semantic based P2P System for local e-Government Fernando Ortiz-Rodriguez 1, Raúl Palma de León 2 and Boris Villazón-Terrazas 2 1 1Universidad Tamaulipeca.
Center for E-Business Technology Seoul National University Seoul, Korea Optimization of Multi-Domain Queries on the Web Daniele Braga, Stefano Ceri, Florian.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
1 Context-Aware Internet Sharma Chakravarthy UT Arlington December 19, 2008.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
A Generalized Architecture for Bookmark and Replay Techniques Thesis Proposal By Napassaporn Likhitsajjakul.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Service Marts: a Service Framework for Search Computing Alessandro Campi Andrea Maesani.
Unclassified//For Official Use Only 1 RAPID: Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University PI : Prof. Jaime.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Building Systems for Today’s Dynamic Networked Environments A Methodology for Building Sustainable Enterprises in Dynamic Environments through knowledge.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Lecture 3 Prescriptive Process Models
Information Day on “Search Engines for Audio-Visual Content”
Search Engine Architecture
Data Warehousing and Data Mining
C.U.SHAH COLLEGE OF ENG. & TECH.
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
CSE 635 Multimedia Information Retrieval
Search Engine Architecture
Exploratory Search Framework for Web Data Sources
Chaitali Gupta, Madhusudhan Govindaraju
UPTIME & SEMANTIC WEB STANDARDS
Context-Aware Internet
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

Search Computing Stefano Ceri

Prof. Stefano Ceri Database Management Search Computing, an EU-funded Project  European Reseach Council (ERC) runs EU program IDEAS –Funding body set up to support investigator-driven frontier research  2 calls each year: –Starting Grant: for most talented scientists and scholars with scientific leadership potential in 2008, 9200 total proposals, 300 funded –Advanced Grant: for “exceptional research leaders” in 2008, >1000 proposals “science&engineering”, 100 funded 2

Prof. Stefano Ceri Database Management Motivating Examples  “Who are the strongest candidates in Europe for competing on software ideas?” 3  “Who is the best doctor who can cure insomnia in a close-by hospital?”  “Where can I attend an interesting scientific conference in my field and at the same time relax on a beautiful beach nearby?” This information is available on Internet, but no software system is capable of computing the answer. Queries span over multiple semantic domains and require composing ranking of results.

Prof. Stefano Ceri Database Management Search Computing  Search computing is a new multi-disciplinary science which will provide the abstractions, foundations, methods, and tools required to give answer multi-domain queries on the Web 4  Emphasis on: –Search services. These are software services producing ranked information. Ranking is essential for search service composition. –Search Integration. The objective is not to build new search systems but instead to integrate a world-wide network of search systems.  Web site:

Prof. Stefano Ceri Database Management Research Chapters of Search Computing, 1  Foundational theories, rooted into formal disciplines such as mathematics and optimization theory.  Statistical models for estimating the number and qualities of the results produced by a search service.  Optimization methods for determining efficient plans for service integration  Software paradigms for designing and constructing search computing systems.  Interaction paradigms to help user-friendly expression of queries and ranking.  Framework for search-oriented software architectures and their instrumentation. 5

Prof. Stefano Ceri Database Management Research Chapters of Search Computing, 2  Semantic domain knowledge for dealing with terminological aspects in composing search engine results.  Higher-order rankings for prioritizing search objects and services.  Personal and social aspects for setting ranking in relationship to individuals and context.  Business models for pushing and developing search computing economy.  Legal and privacy issues concerned with search sources integration.  Advanced computational architectures for search computing. 6

Prof. Stefano Ceri Database Management Results before Search Computing  FUNDING ( ): PRIN NGS (New Generation Search) –Politecnico Milano (National Coordination) –University of Roma 3 –Free University of Bolzano  The brick: join of two search services –Information Systems, March 2008  The framework: multi-domain query optimization –International Very Large Data Bases Conference, Auckland (NZ), August 2008  The interface: mash-up based interaction –IEEE-Internet Computing, November 2008  Optimality: top-K extraction in rank aggregation –Currently submitted 7

Prof. Stefano Ceri Database Management Search Computing paradigm: overall view 8 Main Query flow relation High level query “Where can I attend a DB scientific conference close to a beautiful beach reachable with cheap flights?” Sub query 1 “Where can I attend a DB scientific conference?” Sub query 2 “place close to a beautiful beach?” Sub query 3 “place reachable with cheap flight?” Low level query 1 ConfSearch(“DB”,placeX,dateY) Low level query 2 TourSearch(“Beach”,PlaceX) Low level query 3 Flight(“cost<200”,PlaceX,DateY) Query plan Services invocations and operators execution Results Presented results MSVVEIS’08 - Barcelona – Iberia LID’08 – Rome - Alitalia RCIS’08- Marrakech- AirFrance

Prof. Stefano Ceri Database Management Search Computing architecture: incremental prototyping 9 Prototype 1: Core behaviour of the system. Engine-based execution of queries Domain repository Service repository Coarse result presentation relation Prototype 2: Planning Automatic optimized query planning Prototype 3: Mapping and presentation mapping to domains presentation of results Prototype 4: High level queries

Prof. Stefano Ceri Database Management Search Computing at month 6  Search is a very competitive arena –Just to name a few newcomers: Bing, Wolfram-Alpha, Kosmics  Academic research in search is hard –Even simple ideas require lots of investments to be proven 10  After initial brainstorming and lots of thinking, our current approach is to stay away from core research in: –Global indexing and crawling –Semantic web –Communities... and instead focus on our strength: –Data management –Query optimization and execution (on scalable architectures) –Software/service technologies and tools –Process modelling and mining

Prof. Stefano Ceri Database Management Reasoning about Search 09  UNIVERSAL APPROACHES –Indexing + global page ranking: Google –Classification: Yahoo, Bing –Semantics: Wolfram-Alpha  DOMAIN-EXPERT APPROACHES –Fixed horizontal composers: Kosmics – broadcasts the same query to multiple engines and collate results. –Domain-specific meta searchers: Tuifly – broadcasts the same query to multiple engines, collects and ranks results. –Fixed vertical composers: Expedia – given known compositional patterns between flights, hotels, cars, travel- related events –broadcasts modified queries to data sources, collects, composes and ranks results. –Search computing systems: extending the compositional pattern used by fixed vertical composers Expedia in a very specific context (travels) to arbitrary contexts, with multiple domains, many search engines, and greater query variance, but with known composition methods. 11

Prof. Stefano Ceri Database Management What are the assets of search computing?  A standard model for search services (service-mart), with almost-flat representation and with suitable parameters for computing query cost/time.  A registration strategy consisting of providing, for each pair of services and each composition semantics, a “composition set-ups” (service properties that should be compared).  A standard model for composition, based on the notion of join between web services, and several composition operations (join methods) for associating a query with execution strategies.  A query optimization strategy, consisting of methods for determining a plan, i.e. selecting the involved services, inferring the compositional semantics to be used, and determining the best composition operations.  A service scheduling environment, consisting of methods for executing a plan, i.e. iterating service invocation, computing compositions, evaluating global rankings, determining stop conditions, caching results, enabling backtracking and recomputing.  Liquid presentation of query results, enabling browsing of results and sophisticated controls for, e.g. asking more results, rolling up, drilling down, augmenting the query in given dimensions. 12

Prof. Stefano Ceri Database Management Current focus: Simplified instantiations of search computing  Parametric query, fixed choice of services, fixed composition. –E.g.: “best trips for a soccer supporter who wants to follow the team on a road game to a given “city” and also find in the city of the game a good hotel, cheap and fast rountrip transports, and “rock music” event within “2” days from the game.”  Composable query, variable choice of services, fixed composition once services are chosen. –E.g.: queries allowing users to focus their interests on offers in june about monuments, sport events, concerts, museums, hiking trails, beaches, fairs,… thus finding a city or area within Europe where the top offers matching their interests are present, and at the same time there is an affordable option for roundtrip and hotel stay in the city or area, and good average climate in june. 13

Prof. Stefano Ceri Database Management From queries to processes Start from a multi-domain query: –Search for the best movie-theatre combination where the movie must be an action movie (ranked by stars) and the theatre must be close to home. and then … –Once several candidate movies are located, look for their actors, their directore, other films directed by that director, and so on… –Once several candidate theatres are located, look for close-by pizzeria, for transportation, for parkings… with search options… –enable a user to dynamically impose search ordering (first choose the movie then the theatres) –enable backtracking (if theatre location is not satisfactory after investigation go back and change theatre) 14

Prof. Stefano Ceri Database Management From queries to goals If a query is being asked what does the user really want?  Composition set-ups can give the most likely directions of extension of the current query –These can be observed/mined from multiple query instantations and then suggested in “ranking order”  Disaggregation of global rankings and association with results can suggest the most promising direction of improvement for the user: –The one with fewer answers –The one whose ranking in the answers did not drop much  Disaggregation of the query + results by services may enable inspecting/changing one at a time –Exploring the search space by steps, with a sort of “pivotal” exploration (at every new search, a new dimension goes on focus but the dimension previously on focus is fixed) 15

Prof. Stefano Ceri Database Management Future Focus (far away) Search Computing in the Universe  An “exploratory” approach to search computing, starting from NL queries and combining: –Lightweight semantics (inspired by Wordnet) for service description –Open-source NLP queries and processors for query analysis and decomposition (aiming at using a mix of syntactic methods and of light semantic annotations for mapping sentence chunks to domains of interests) –Distance-based and clustering methods for mapping queries to services.  Will measure distance between “discovered mappings” and “intended mappings” while the query broadens to enlarge more and more domains.  Will enable us to understand the pros and cons of a service-oriented approach to search in a global sense 16

Prof. Stefano Ceri Database Management The next six months  Foundations, foundations, foundations! –Service marts: motivation, theory, design, source wrapping, materialization and incremental maintenance –Join methods: theory, efficient implementation, bio-inspired methods –Optimization: plan selection through decision trees, strategy analysis and comparison –Execution (panta rhei): producer-consumer system with service scheduling, context, caching, run-time controls –Interfaces: liquid queries and liquid results Together with a fast prototyping attitute and the objective of delivering the core of the technology in the next six monts  Software engineering methods and tools for search computing For enabling application deployment with core technologies  Human-computer interaction for search computing For enabling navigation on result combinations 17

Prof. Stefano Ceri Database Management Search Computing Workshop, June 17-19,

Prof. Stefano Ceri Database Management First Day 09: :30Registration 09: :30 Introduction to the two ERC projects Stefano Ceri (Politecnico di Milano) - SECO: Search Computing Carlo Ghezzi (Politecnico di Milano)- SMSCom: Self-Managing Situated Computing 10: :00 Coffee break 11: :30 Joint ERC presentations Jeff Magee (Imperial College, UK) - Engineering Self-Managed Systems Ricardo Baeza-Yates (Yahoo! Research, Barcelona) - The Next Generation of Search 12: :00Lunch 14: :45 Rank Aggregation Ihab Ilyas (University of Waterloo): Supporting Ranking in (Uncertain) Database Systems Davide Martinenghi (Politecnico di Milano): Cost-Aware Top-K Join Algorithms Adnan Abid (Politecnico di Milano): Evolutionary Techniques for Join Strategies 15: :15 Coffee break 16: :45 Service Registration and WebSite Wrapping Georg Gottlob (Oxford University): Web Data Extraction: The Lixto Project and Future Plans Alessandro Campi (Politecnico di Milano): Service Marts: a Service Framework for Search Computing 19

Prof. Stefano Ceri Database Management Second Day 09: :30 Joint ERC presentations Norman Paton (University of Manchester): Dataspaces and Search Computing: New Paradigms or Step Changes Dick Taylor (UC Irvine, USA): Runtime software adaptability 10: :00 Coffee break 11: :45 Service composition Fabio Casati (University of Trento): Universal Mashup Languages Daniele Braga (Politecnico di Milano): Join methods and query planning for Search Computing Michael Grossniklaus (Politecnico di Milano): A producer-consumer model of query execution 12: :00 Lunch 14: :30 Search computing architectures Ioana Manolescu (INRIA, PARIS): Of Distributed Queries: Models and Systems Marco Brambilla (Politecnico di Milano): Liquid queries Evening: Joint ERC recreational activities: a walk on the hills around Como/Brunate. The idea is to reach Brunate by funicular ( hike from Brunate to the top of Monte Boletto, which enjoys a great view of the lake. We will have dinner at: LOCANDA DEL DOLCE BASILICO, Mulattiera s. Maurizio 24, Brunate (CO) 20

Prof. Stefano Ceri Database Management Third Day 09: :30 Business Plan and Technology Watch Tommaso Buganza (DIG, Politecnico di Milano) and Emanuele Della Valle (DEI, Politecnico di Milano): Technology Watch for Search Computing 10: :00 Coffee break 11: :30 Priorities in SeCo Research Moderator: Piero Fraternali (Politecnico di Milano) 12: :00 Lunch 14: :00 Tutorial Florian Daniel (University of Trento): Innovation in Web Technology 15:00 – 15:30 Coffee Break 15:30 – 17:00 Stefano Ceri (Politecnico di Milano): Workplan, Workshop Conclusions and Wrap-up 21

Prof. Stefano Ceri Database Management Search Computing Challenges and Directions (LNCS, Ceri-Brambilla eds) Part 1: Vision –Ceri: Search computing –Baeza-Yates: Next generation search –Weikum: Search for knowledge  Part 2: Technology Watch for Search Computing –Dellavalle-Buganza-Gatti: The search engine industry –Casati-Daniel-Soi: Mashup technologies –Baumgartner-Campi-Gottlob-Herzog: Web data extraction –Hedeler-Belhajjame-Campi-Embury-Fernandez-Paton:Dataspaces –Bozzon-Fraternali: Multimedia and multimodal information retrieval  Part 3: Issues in Search Computing –Campi-Ceri-Gottlob-Ronchi: Service marts –Braga-Campi-Grossniklaus: Join methods and query optimization –Ilyas-Martinenghi-Tagliasacchi: Rank aggregation –Braga-Grossinklaus-Ceri: Panta Rhei, a query execution environment –Brambilla-Ceri-Fraternali-Manolescu: Liquid queries and liquid results –Brambilla-Ceri: Software engineering of search computing applications –Masseroli-Paton-Spasic: Search computing and the life sciences 22

Prof. Stefano Ceri Database Management 23 问题 ?

Prof. Stefano Ceri Database Management SeCo Teams  Theory and Methods (Davide Martinenghi, Marco Tagliasacchi). Design of solid methods (with known performance and guarantees) returning top-k query results.  Service Registration and Management (Alex Campi, Stefania Ronchi, Andrea Maesani). Registration of new search services, their semantic description, and the production of relevant parameters. We envision informal and quick deployment and registration of search services.  Query Processing and Execution Engine (Daniele Braga, Michael Grossnicklaus, Davide Barbieri, Adnan Abid, Mahmoud Abu Helou, several Ms Students Francesco Corcoglioniti, ). Operation-based model of SeCo enabling the mapping of user queries into execution plans and the selection and execution of "optimal" execution plans. 24

Prof. Stefano Ceri Database Management SeCo Teams  Tools (Marco Brambilla, Alessandro Bozzon, several Ms students). Developer-oriented and user-oriented tools, demonstrators of the "current" technology throughout the project.  Business Models and Technology Watch (Emanuele Della Valle, Roberto Verganti, Tommaso Buganza, Nicola Gatti, Sofia Ceppi). Setting the strategic directions of the project, offering a "technological watch" and then discussing "scenarios" that can lead to better results from all perspectives, including business.  Interaction Design (Piero Fraternali, Sara Comai, Maristella Matera, Davide Mazza). Paradigms for improving interaction, design of effective feedback methods for involving users in producing the answer to queries.  Concept Team (Stefano Ceri and all the team coordinators). Coordinating the project, deciding planning and milestones, deciding about technological standards, and integrating the various parts. Manage resource allocation, including human resources. 25