ICDE2009 Keynotes Summary Shanghai, China, 3.29-4.2 Li Yukun.

ICDE2009 Keynotes Summary Shanghai, China, 3.29-4.2 Li Yukun

Outline Keynotes  Search Computing(Stefano Ceri)  Data Management in the Cloud(Raghu Ramakrishnan)  Why Can't I Find My Data the Way I Find My Dinner? David Carlson

Keynote 1 Search Computing Stefano Ceri Dipartimento di Elettronica e Informazione, Politecnico di Milano Piazza L. Da Vinci 32, 20133 Milano, Italy Stefano.Ceri@polimi.it

Motivation “Who are the strongest European competitors on software ideas? Who is the best doctor to cure insomnia in a nearby hospital? Where can I attend an interesting conference in my field close to a sunny beach?” This information is available on the Web, but no software system can accept such queries nor compute the answer.

Core model for search computing Conventional services  Are abstracted as systems producing sets of equal-weight answers; Service computing  A cross-discipline that covers the science and technology of bridging the gap between Business Services and IT Services.  The goal of Services Computing is to enable IT services and computing technology to perform business services more efficiently and effectively. Search services  Can be abstracted as systems producing ranked lists of answers. Search computing  It is a new paradigm where ranking is the dominant factor for composing services.  Multi-domain query, constellation of cooperating search services, possibly dynamically selected,

CHAPTERS OF SEARCH COMPUTING Theory for search computing  Select the best abstractions covering the concepts  Design basic operations on services and algorithms  Compute time and space complexity Statistical models for search services  Build statistical estimators of the number and quality of the results Optimization methods for search computing Description abstractions for search services  Expose ranking-specific properties of search services Language abstractions for search computing  by incorporating the ranking aspects and strategies for dealing with rankings

CHAPTERS OF SEARCH COMPUTING Human-computer interfaces  Expressing ranking preferences.  Light-weight user interaction Semantics  Merging the results of heterogeneous search services  semantic “join” of search services. Higher-order ranking  “ranking of rankings”, is essential for selecting and prioritizing search services.  A multi-level one, Managing individual and social searching  search strategies to user profiling or to past user interactions  Societal recommendation and evaluation  Thus, individual and societal aspects are key ingredients for search computing

CHAPTERS OF SEARCH COMPUTING Search computing engineering  designing, assembling and deploying search computing software applications. Economy of search computing  Suitable business models, based upon advertising schemes, pay-per-query, subscription fees, micro-billing, and so on. Security and privacy of search computing  control of how data is used.  For instance, use of a search service could be granted to a service computing application, provided that the service’s owners can trace all queries involving their data and limit the kind of information that is made visible to the queries.

PROJECT ORGANIZATION Funded by the European Research Council in the framework of the IDEAS Advanced Grants; It started on Nov. 1, 2008 and will last five years.

PROJECT ORGANIZATION The project involves about 30 researchers at Politecnico  Abdan Abid, Edoardo Amaldi, Alessandro Bozzon, Daniele Maria Braga, Marco Brambilla, Tommaso Buganza, Alessandro Campi, Sofia Ceppi, Sara Comai, Emanuele Della Valle, Piero Fraternali, Nicola Gatti, Michael Grossniklaus, Ma’moun Abu Hellu, Pier Luca Lanzi, Davide Martinenghi, Marco Masseroli, Maristella Matera, Davide Mazza, Giuseppe Pozzi, Stefania Ronchi, Roberto Verganti, Marco Tagliasacchi, Massimo Tisi. SeCo has an advisory board  Edoardo Amaldi (Operations Research),  Fabio Casati (Service Computing),  Georg Gottlob (Theory),  Ioana Manolescu (Systems and Performance),  Roberto Verganti (Business Models),  Gerhard Weikum (Information Retrieval for the Web),  Jennifer Widom (Languages and Paradigms)

seven teams Concept team Theory and methods Service registration and management Query processing Interaction design Tools and prototypes Business models and technology watch

More information on SeCo is available on the project’s Web site: http://home.dei.polimi.it/ceri/seco/index.html

Outline Keynotes  Search Computing Stefano Ceri  Data Management in the Cloud Raghu Ramakrishnan  Why Can't I Find My Data the Way I Find My Dinner? David Carlson

Keynote 2: Data Management in the Cloud Yahoo! Research Raghu Ramakrishnan Brian Cooper Utkarsh Srivastava Adam Silberstein Nick Puz Rodrigo Fonseca CCDI Chuck Neerdaels P.P.S. Narayan Kevin Athey Toby Negrin Plus Dev/QA teams

SCENARIOS Pie-in-the-sky

Living in the Clouds We want to start a new website, FredsList.com Our site will provide listings of items for sale, jobs, etc. As time goes on, we’ll add more features  illustrate how more cloud capabilities are used as needed  List of capabilities/components is illustrative, not exhaustive

Step 1: Listings Simple Web Service API’s Database Sherpa FredsList.com application 1234323, transportation, For sale: one bicycle, barely used FredsList wants to store listings as (key, category, description) 5523442, childcare, Nanny available in San Jose 215534, wanted, Looking for issue 1 of Superman comic book DECLARE DATASET Listings AS ( ID String PRIMARY KEY, Category String, Description Text ) DECLARE DATASET Listings AS ( ID String PRIMARY KEY, Category String, Description Text )

Step 2: Search Simple Web Service API’s Database Sherpa “bicycle” FredsList’s customers quickly ask for keyword search Search Vespa “dvd’s” “nanny” Messaging YMB FredsList.com application ALTER Listings SET Description SEARCHABLE ALTER Listings SET Description SEARCHABLE

Step 3: Photos Simple Web Service API’s Database Sherpa FredsList decides to add photos to listings Search Vespa Messaging YMB Storage MObStor Foreign key photo → listing FredsList.com application ALTER Listings ADD Photo BLOB ALTER Listings ADD Photo BLOB

Step 4: Data Analysis Simple Web Service API’s Database Sherpa FredsList wants to analyze its listings to get statistics about category, do geocoding, etc. Search Vespa Messaging YMB Storage MObStor Foreign key photo → listing FredsList.com application ALTER Listings MAKE ANALYZABLE ALTER Listings MAKE ANALYZABLE Compute Grid Batch export Pig query to analyze categories Hadoop program to geocode data Hadoop program to generate fancy pages for listings

Step 5: Performance Simple Web Service API’s Database Sherpa FredsList wants to reduce its data access latency Search Vespa Messaging YMB Storage MObStor Foreign key photo → listing FredsList.com application ALTER Listings MAKE CACHEABLE ALTER Listings MAKE CACHEABLE Compute Grid Batch export Caching memcached

EYES TO THE SKIES Motherhood-and-Apple-Pie

Requirements for Cloud Services Multitenant  A cloud service must support multiple, organizationally distant customers. Elasticity  Tenants should be able to negotiate and receive resources/QoS on-demand. Resource Sharing  Ideally, spare cloud resources should be transparently applied when a tenant’s negotiated QoS is insufficient. Horizontal scaling  It should be possible to add cloud capacity in small increments; this should be transparent to the tenants Metering  A cloud service must support accounting that reasonably ascribes operational and capital expenditures to each of the tenants of the service. Security  A cloud service should be secure in that tenants are not made vulnerable because of loopholes in the cloud. Availability  A cloud service should be highly available. Operability  A cloud service should be easy to operate

Types of Cloud Services Two kinds of cloud services:  Horizontal Cloud Services Functionality enabling tenants to build applications or new services on top of the cloud  Functional Cloud Services Functionality that is useful in and of itself to tenants. E.g., various SaaS instances, such as Saleforce.com; Google Analytics and Yahoo!’s IndexTools; Yahoo! properties aimed at end-users and small businesses, e.g., flickr, Groups, Mail, News, Shopping Yahoo! has been offering these for a long while (e.g., Mail for SMB, Groups, Flickr, BOSS, Ad exchanges)

SHERPA To Help You Scale Your Mountains of Data

The Sherpa Solution The next generation global-scale record store  Record-orientation: Routing, data storage optimized for low-latency record access  Scale out: Add machines to scale throughput (while keeping latency low)  Asynchrony: Pub-sub replication to far-flung datacenters to mask propagation delay  Consistency model: Reduce complexity of asynchrony for the application programmer  Cloud deployment model: Hosted, managed service to reduce app time-to-market and enable on demand scale and elasticity 26

QUERY PROCESSING 27

Accessing Data 28 SU 1 Get key k 2 3 Record for key k 4

Bulk Read 29 SU Scatter/ gather server SU 1 {k1, k2, … kn} 2 Get k 1 Get k 2 Get k 3

Storage unit 1Storage unit 2Storage unit 3 Range Queries in YDOT Clustered, ordered retrieval of records Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1 Router Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Strawberry Tomato Watermelon Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Strawberry Tomato Watermelon Grapefruit…Pear? Grapefruit…Lime? Lime…Pear? Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1

Updates 1 Write key k 2 7 Sequence # for key k 8 SU 3 Write key k 4 5 SUCCESS 6 Write key k Routers Message brokers 31

ASYNCHRONOUS REPLICATION AND CONSISTENCY 32

Asynchronous Replication 33

Goal: make it easier for applications to reason about updates and cope with asynchrony What happens to a record with primary key “Brian”? Consistency Model 34 Time Record inserted Update Delete Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Update

Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Current version Stale version Read Consistency Model 35

Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Read up-to-date Current version Stale version Consistency Model 36

Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Read ≥ v.6 Current version Stale version Consistency Model 37

Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Write Current version Stale version Consistency Model 38

Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Write if = v.7 ERROR Current version Stale version Consistency Model 39

Consistency Techniques Per-record mastering  Each record is assigned a “master region” May differ between records  Updates to the record forwarded to the master region  Ensures consistent ordering of updates Tablet-level mastering  Each tablet is assigned a “master region”  Inserts and deletes of records forwarded to the master region  Master region decides tablet splits These details are hidden from the application  Except for the latency impact!

Index Maintenance How to have lots of interesting indexes, without killing performance? Solution: Asynchrony!  Indexes updated asynchronously when base table updated Planned functionality

SHERPA IN CONTEXT 42

43 MObStor Yahoo!’s next-generation globally replicated, virtualized media object storage service Better provisioning, easy migration, replication, better BCP, and performance New features (Evergreen URLs, CDN integration, REST API, …) The object metadata problem is addressed using Sherpa, though MObStor is focused on blob storage.

Storage & Delivery Stack

The World Has Changed Web applications need  Scalability!  Geographic distribution  High availability  Reliable storage Web applications be unfit for  Complicated queries  Strong transactions

Web Data Management Large data analysis (Hadoop) Structured record storage (PNUTS) Blob storage (SAN/NAS) Scan oriented workloads Focus on sequential disk I/O $ per cpu cycle CRUD Point lookups and short scans Index organized table and random I/Os $ per latency Object retrieval and streaming Scalable file storage $ per GB

Application Design Space Records Files Get a few things Scan everything Sherpa MObStor Everest Hadoop YMDB MySQL Filer Oracle BigTable 47

Further Reading Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008) Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, Raghu Ramakrishnan PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008) Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana Yerneni

Outline Keynotes  Search Computing(Stefano Ceri)  Data Management in the Cloud(Raghu Ramakrishnan)  Why Can't I Find My Data the Way I Find My Dinner? David Carlson

Keynote 3 Why Can’t I Find My Data the Way I Find My Dinner? David Carlson  Director International Polar Year International Programme Office  Cambridge, UK  ipy.djc@gmail.com

International Polar Year(IPY) One can find almost every discipline represented in the IPY projects, and funding has come from geophysical, biological and social agencies and programs.

IPY data open access data policy display and access of IPY data We have component systems, within nations, disciplines, or existingdata service centers, that provide access examples for portions of the IPY data set. We have unprecedented bandwidth for real-time data transmission But, How to access these data set easily!!!

enormous challenges financial social and technical barriers this talk focuses on the latter.

Example To understand and predict the health of migratory bird populations in the polar environment,  Need ornithological, toxicological, ecological, meteorological, hydrological, climatological, geomagnetic, and sociological data.  These data will cover a broad range of space and times scales, often in disparate (or at least inconsistent) space and time coordinate system

Problems Data access  For a larger population of curious users, the specialized data services associated with subsets of the IPY data will not provide easy, friendly, or even accessible Interfaces  No familiar interfaces will provide integrated discovery and browse services. No long-term plan  On longer time scales, and even as data storage capabilities grow rapidly, most of the IPY data sets donot, at present, have acceptable long-term archive plans, even for passive storage without continued discovery services.

Research issues smart search engines pattern recognition data mining tools multi-gigabyte personal storage devices Advanced animation capabilities coupled with almost unlimited mobile bandwidth offer many citizens expansive and amazing access to commercial, recreational, financial, and personal data and data services. What changes in strategy, technology, funding and individual and collective behavior need to occur in the world of scientific data to allow me to browse, view and access IPY data on my iTouch?

Thanks

ICDE2009 Keynotes Summary Shanghai, China, 3.29-4.2 Li Yukun.

Similar presentations

Presentation on theme: "ICDE2009 Keynotes Summary Shanghai, China, 3.29-4.2 Li Yukun."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ICDE2009 Keynotes Summary Shanghai, China, 3.29-4.2 Li Yukun.

Similar presentations

Presentation on theme: "ICDE2009 Keynotes Summary Shanghai, China, 3.29-4.2 Li Yukun."— Presentation transcript:

Similar presentations

About project

Feedback